Download the video and both VTT files, then:
ffmpeg -i video.webm -i evangelion-30th.ja.vtt -i evangelion-30th.en.vtt \
-map 0 -map 1 -map 2 -c:v copy -c:a copy -c:s webvtt \
-metadata:s:s:0 language=jpn -metadata:s:s:0 title="Japanese" \
-metadata:s:s:1 language=eng -metadata:s:s:1 title="English" \
output.webm