Last active
January 9, 2026 15:37
-
-
Save unhammer/eea7b9bb398c258b54fb9b5159b36ebd to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| mawk -f vtt2html.awk test.vtt | apertium -f html nob-nno_e | awk -f html2vtt.awk |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/awk -f | |
| # Remove the tags added by vtt2html.awk | |
| { | |
| gsub(/<\/script><div>/, "") | |
| gsub(/<\/div><script>/, "") | |
| gsub(/<\/?script>/, "") | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| WEBVTT | |
| 00:11.000 --> 00:13.000 | |
| <v fortelleren>Det var 1. april 19– ved 1 tiden. | |
| 00:30.000 --> 00:31.500 align:right size:50% | |
| <v fortelleren>Det gikk som en løpeild gjennom «Grand café», | |
| <v fortelleren>at direktør Jens Christian Sahle, a/s «Kvarts», | |
| <v fortelleren>i formiddags hadde meldt seg for underslag, | |
| 00:32.500 --> 00:33.500 align:left size:50% | |
| <v arkitekt Hall><i>– Det var da merkelig at han ikke kunne greie det, sa Hall.</i> | |
| 00:35.500 --> 00:38.000 | |
| <v.soft bankkasserer Wang>Det må det ha vært, sa han. | |
| 00:05.000 --> 00:09.000 | |
| — Men hva fanden kan det stikke i, at han så allikevel ikke greide affæren? | |
| — Avgjort ikke. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/awk -f | |
| # Protect non-content from Apertium translation by wrapping it in | |
| # <script> tags; also wrap each line in <div> in case of missing line | |
| # breaks. | |
| BEGIN { | |
| print "<script>" | |
| } | |
| # First nonempty line after script start is WEBVTT; just print it | |
| NR == 1 { | |
| next | |
| } | |
| # Blank lines: print as-is | |
| /^$/ { | |
| next | |
| } | |
| # Timestamp / cue header lines: print as-is | |
| /^[0-9][0-9]:[0-9][0-9]\.[0-9][0-9][0-9] --> / { | |
| next | |
| } | |
| # Lines starting with <v ...> | |
| /^<v[^>]*>/ { | |
| # Insert closing <script> and opening <div> right after the <v...> tag | |
| sub(/^<v[^>]*>/, "&</script><div>") | |
| # And close div + reopen script at end of line | |
| print $0 "</div><script>" | |
| next | |
| } | |
| # Any other nonempty, non-timestamp line is cue text: wrap it | |
| { | |
| print "</script><div>" $0 "</div><script>" | |
| } | |
| END { | |
| print "</script>" | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment