Skip to content

Instantly share code, notes, and snippets.

@unhammer
Last active January 9, 2026 15:37
Show Gist options
  • Select an option

  • Save unhammer/eea7b9bb398c258b54fb9b5159b36ebd to your computer and use it in GitHub Desktop.

Select an option

Save unhammer/eea7b9bb398c258b54fb9b5159b36ebd to your computer and use it in GitHub Desktop.
mawk -f vtt2html.awk test.vtt | apertium -f html nob-nno_e | awk -f html2vtt.awk
#!/usr/bin/awk -f
# Remove the tags added by vtt2html.awk
{
gsub(/<\/script><div>/, "")
gsub(/<\/div><script>/, "")
gsub(/<\/?script>/, "")
print
}
WEBVTT
00:11.000 --> 00:13.000
<v fortelleren>Det var 1. april 19– ved 1 tiden.
00:30.000 --> 00:31.500 align:right size:50%
<v fortelleren>Det gikk som en løpeild gjennom «Grand café»,
<v fortelleren>at direktør Jens Christian Sahle, a/s «Kvarts»,
<v fortelleren>i formiddags hadde meldt seg for underslag,
00:32.500 --> 00:33.500 align:left size:50%
<v arkitekt Hall><i>– Det var da merkelig at han ikke kunne greie det, sa Hall.</i>
00:35.500 --> 00:38.000
<v.soft bankkasserer Wang>Det må det ha vært, sa han.
00:05.000 --> 00:09.000
— Men hva fanden kan det stikke i, at han så allikevel ikke greide affæren?
— Avgjort ikke.
#!/usr/bin/awk -f
# Protect non-content from Apertium translation by wrapping it in
# <script> tags; also wrap each line in <div> in case of missing line
# breaks.
BEGIN {
print "<script>"
}
# First nonempty line after script start is WEBVTT; just print it
NR == 1 {
print
next
}
# Blank lines: print as-is
/^$/ {
print
next
}
# Timestamp / cue header lines: print as-is
/^[0-9][0-9]:[0-9][0-9]\.[0-9][0-9][0-9] --> / {
print
next
}
# Lines starting with <v ...>
/^<v[^>]*>/ {
# Insert closing <script> and opening <div> right after the <v...> tag
sub(/^<v[^>]*>/, "&</script><div>")
# And close div + reopen script at end of line
print $0 "</div><script>"
next
}
# Any other nonempty, non-timestamp line is cue text: wrap it
{
print "</script><div>" $0 "</div><script>"
}
END {
print "</script>"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment