Last active
January 7, 2025 07:27
-
-
Save raivisdejus/07ca2e37d1fb87f81df12e424cf9175b to your computer and use it in GitHub Desktop.
Latviesu runas atpazisana
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "view-in-github", | |
| "colab_type": "text" | |
| }, | |
| "source": [ | |
| "<a href=\"https://colab.research.google.com/gist/raivisdejus/07ca2e37d1fb87f81df12e424cf9175b/latviesu-runas-atpazisana.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#Latviešu valodas runas atpazīšana\n", | |
| "\n", | |
| "Šajā bloknotā ir rīki latviešu valodas runas atpazīšanai. Tiek izmantots Latvijas Universitātes Matemātikas un informātikas institūta Mākslīgā intelekta laboratorijā izveidotais <a target=\"_blank\" href=\"https://huggingface.co/AiLab-IMCS-UL/whisper-large-v3-lv-late-cv17\">runas atpazīšanas modelis</a>, kas veidots, izmantojot <a target=\"_blank\" href=\"https://balsutalka.lv/\">Balsu talkā</a> savāktos datus.\n", | |
| "\n", | |
| "Modelis ir ar augstu runas atpazīšanas precizitāti, taču tā rezultātu tāpat ir ieteicams pārbaudīt.\n", | |
| "\n", | |
| "Lai veiktu runas atpazīšanu audio vai video failā, sekojiet zemāk uzskaitītajiem soļiem. Šī bloknota lietošanas video pamācība <a target=\"_blank\" href=\"https://youtu.be/MjeawBpB5xg\">pieejama šeit</a>." | |
| ], | |
| "metadata": { | |
| "id": "zZBBTnW-aThp" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "##1. Nomainiet izpildlaika veidu uz T4 GPU\n", | |
| "\n", | |
| "Lai to izdarītu galvenajā izvēlnē, šīs lapas augšpusē ejiet uz `Izpildlaiks` -> `Mainīt izpildlaika veidu` un izvēlieties `T4 GPU`.\n", | |
| "\n", | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "GALJps6fDlQD" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "#@title 2. Ielādējiet nepieciešamos rīkus.\n", | |
| "#@markdown <--- Nospiediet uz šīs atskaņošanas pogas, lai ielādētu nepieciešamos rīkus.\n", | |
| "import ipywidgets as widgets\n", | |
| "from IPython.display import clear_output\n", | |
| "\n", | |
| "display(widgets.HTML(\n", | |
| " value=\"<h3>Notiek ielāde...</h3>\"\n", | |
| "))\n", | |
| "\n", | |
| "!apt install ffmpeg\n", | |
| "\n", | |
| "import os\n", | |
| "import glob\n", | |
| "import torch\n", | |
| "import textwrap\n", | |
| "import subprocess\n", | |
| "from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline\n", | |
| "\n", | |
| "\n", | |
| "clear_output()\n", | |
| "\n", | |
| "\n", | |
| "def generate_srt(chunks):\n", | |
| " srt_content = ''\n", | |
| " sentence = ''\n", | |
| " start_time_srt = None\n", | |
| " end_time_srt = None\n", | |
| " i = 1\n", | |
| " for chunk in chunks:\n", | |
| " if start_time_srt is None:\n", | |
| " start_time, _ = chunk['timestamp']\n", | |
| " start_time_srt = \"{:02}:{:02}:{:02},{:03}\".format(int(start_time // 3600), int((start_time % 3600) // 60), int(start_time % 60), int((start_time % 1) * 1000))\n", | |
| " sentence += chunk['text']\n", | |
| " _, end_time = chunk['timestamp']\n", | |
| " end_time_srt = \"{:02}:{:02}:{:02},{:03}\".format(int(end_time // 3600), int((end_time % 3600) // 60), int(end_time % 60), int((end_time % 1) * 1000))\n", | |
| " if chunk['text'].strip().endswith(('.', '?', '!')):\n", | |
| " wrapped_text = '\\n'.join(textwrap.wrap(sentence.strip(), width=42))\n", | |
| " srt_content += f\"{i}\\n{start_time_srt} --> {end_time_srt}\\n{wrapped_text}\\n\\n\"\n", | |
| " i += 1\n", | |
| " sentence = ''\n", | |
| " start_time_srt = None\n", | |
| " if sentence: # for the last sentence if it doesn't end with ., ? or !\n", | |
| " wrapped_text = '\\n'.join(textwrap.wrap(sentence.strip(), width=42))\n", | |
| " srt_content += f\"{i}\\n{start_time_srt} --> {end_time_srt}\\n{wrapped_text}\\n\\n\"\n", | |
| " with open('subtitles.srt', 'w') as f:\n", | |
| " f.write(srt_content)\n", | |
| "\n", | |
| "\n", | |
| "def on_file_upload(change):\n", | |
| " display(widgets.HTML(\n", | |
| " value=f'<h3>Izvēlētais fails ir ielādēts, notiek pārbaude...</h3>'\n", | |
| " ))\n", | |
| "\n", | |
| " uploaded_file = next(iter(change['new'].values()))\n", | |
| " file_name = uploaded_file['metadata']['name']\n", | |
| " file_extension = os.path.splitext(file_name)[1]\n", | |
| "\n", | |
| " with open(f'uploaded{file_extension}', 'wb') as f:\n", | |
| " f.write(uploaded_file['content'])\n", | |
| "\n", | |
| " try:\n", | |
| " command = f\"ffmpeg -i uploaded{file_extension} -vn -y audio.flac\"\n", | |
| " process = subprocess.run(command, shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", | |
| " except subprocess.CalledProcessError as e:\n", | |
| " error_message = e.stderr.decode('utf-8')\n", | |
| " with open('error_log.txt', 'w') as error_file:\n", | |
| " error_file.write(error_message)\n", | |
| "\n", | |
| " os.remove(f'uploaded{file_extension}')\n", | |
| "\n", | |
| " display(widgets.HTML(\n", | |
| " value=\"<h3>Pievienotais fails saglabāts!</h3>\"\n", | |
| " ))\n", | |
| "\n", | |
| "\n", | |
| "uploader = widgets.FileUpload(description='Izvēlieties failu', accept='audio/*,video/*', multiple=False)\n", | |
| "uploader.observe(on_file_upload, 'value')\n", | |
| "display(uploader)" | |
| ], | |
| "metadata": { | |
| "id": "zLDYIFciCMTw", | |
| "cellView": "form" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "##3. Izvēlieties failu\n", | |
| "Pēc rīku ielādes parādīsies poga \"Izvēlieties failu\", nospiediet to un izvēlieties failu, kurā atpazīt latviešu valodas runu." | |
| ], | |
| "metadata": { | |
| "id": "QxsuN4r0QGnl" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "#@title 4. Palaidiet runas atpazīšanas procesu\n", | |
| "#@markdown <--- Nospiediet uz šīs atskaņošanas pogas, lai sāktu runas atpazīšanas procesu.\n", | |
| "\n", | |
| "videos = glob.glob('video.*')\n", | |
| "if videos:\n", | |
| " try:\n", | |
| " command = f\"ffmpeg -i {videos[0]} -vn -y audio.flac\"\n", | |
| " process = subprocess.run(command, shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", | |
| " except subprocess.CalledProcessError as e:\n", | |
| " error_message = e.stderr.decode('utf-8')\n", | |
| " with open('error_log.txt', 'w') as error_file:\n", | |
| " error_file.write(error_message)\n", | |
| "\n", | |
| "audio = glob.glob('audio.*')\n", | |
| "\n", | |
| "if not audio:\n", | |
| " display(widgets.HTML(\n", | |
| " value=\"<h3>Nav ielādēts fails atpazīšanai, lūdzu ielādējiet to!</h3>\"\n", | |
| " ))\n", | |
| "else:\n", | |
| " display(widgets.HTML(\n", | |
| " value=\"<h3>Notiek ielāde...</h3>\"\n", | |
| " ))\n", | |
| "\n", | |
| " device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n", | |
| " torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32\n", | |
| "\n", | |
| " model_id = \"AiLab-IMCS-UL/whisper-large-v3-lv-late-cv17\"\n", | |
| "\n", | |
| " model = AutoModelForSpeechSeq2Seq.from_pretrained(\n", | |
| " model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=False, use_safetensors=True\n", | |
| " ).to(device)\n", | |
| "\n", | |
| " processor = AutoProcessor.from_pretrained(model_id)\n", | |
| "\n", | |
| " pipe = pipeline(\n", | |
| " \"automatic-speech-recognition\",\n", | |
| " generate_kwargs={\"language\": \"latvian\", \"task\": \"transcribe\"},\n", | |
| " model=model,\n", | |
| " tokenizer=processor.tokenizer,\n", | |
| " feature_extractor=processor.feature_extractor,\n", | |
| " max_new_tokens=225,\n", | |
| " chunk_length_s=30,\n", | |
| " batch_size=1,\n", | |
| " return_timestamps=\"word\",\n", | |
| " torch_dtype=torch_dtype,\n", | |
| " device=device,\n", | |
| " )\n", | |
| "\n", | |
| " clear_output()\n", | |
| "\n", | |
| " display(widgets.HTML(\n", | |
| " value=\"<h3>Notiek audio atpazīšana...</h3>\"\n", | |
| " ))\n", | |
| "\n", | |
| " result = pipe('audio.flac')\n", | |
| "\n", | |
| " with open('transcript.txt', 'w') as f:\n", | |
| " f.write(result[\"text\"])\n", | |
| "\n", | |
| " generate_srt(result[\"chunks\"])\n", | |
| "\n", | |
| " clear_output()\n", | |
| "\n", | |
| " display(widgets.HTML(value=result[\"text\"]))" | |
| ], | |
| "metadata": { | |
| "id": "_6ovCwwqC6SM", | |
| "cellView": "form" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## 5. Saglabājiet audio failā atpazīto tekstu\n", | |
| "Pēc audio faila apstrādes tajā atpazītais teksts tiks izdrukāts zem ceturtā soļa šūnas, kā arī tas tiks saglabāts teksta failā `transcript.txt`. Atpazītais teksts `.srt` subtitru formātā būs pieejams failā `subtitles.srt`. Lai apskatītu un lejupielādētu šos failus, uzklikšķiniet uz mapītes ikonas ekrāna sānā.\n", | |
| "\n", | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "PLqlD2N7STNq" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "##6. Noderīgi padomi\n", | |
| "* Ja saskaraties ar tehnisku kļūdu, izpildiet vēlreiz 2. soļa šūnu un atkārtojiet atpazīšanu.\n", | |
| "* Garākus audio vai video var ielādēt arī pa tiešo failu sadaļā. Faila nosaukumam jābūt `video`, piemēram, `video.mp4`\n", | |
| "* Video failus var ielādēt arī no Google Diska. Lai to pievienotu, spiediet uz mapītes ikonas ar Google diska logo \n", | |
| "* Video failu audio celiņu var uzlabot, izmantojot [Abobe Podcast](https://podcast.adobe.com/) rīku.\n", | |
| " * Lai no video iegūtu audio celiņu, var izmantot šo `ffmpeg` komandu `ffmpeg -i video.mp4 -vn -ab 128k -ar 44100 -y audio.mp3`\n", | |
| " * Lai uzlaboto audio celiņu pievienotu video, var izmantot šo komandu `ffmpeg -i video.mp4 -i audio_enhanced.mp3 -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 video_enhanced.mp4`" | |
| ], | |
| "metadata": { | |
| "id": "OcWH6fcdri3C" | |
| } | |
| } | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "provenance": [], | |
| "gpuType": "T4", | |
| "include_colab_link": true | |
| }, | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "name": "python3" | |
| }, | |
| "accelerator": "GPU" | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Author
Runas atpazīšanai savā datorā (Linux, Mac, Windows) var izmantot arī šo lietotni https://github.com/chidiwilliams/buzz/
Šī lietotne piedāvā vairākus Whisper veidus un vairākas papildu iespējas. Faster Whisper tipiski izmanto mazāku atmiņas apjomu, tas ir pieejams lietotē Buzz. Tiesa 4GB kartes lielos modeļus visdrīzāk nespēs darbināt. 8GB vajadzētu pietikt.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Paldies autoram! ⭐ 👏👏👏
"eksperimentu" izdodas sekmīgi reproducēt ar cuda 12;
un mani novērojumi apstiprina, ka
1h27m un 1h32 min mp3 ar šiem parametriem
saiet 16gb vram GPU
(piemēram, autora norādītajā T4).
ja nepieciešams uz 8GB, jāmaina parametros kaut kas;
par šo nezinu, bet oriģinālajam whisper man savietojās stipri zem 8GB (bet virs 4GB) pēc viņu readme.md :
1 / 3:
(pie model inicializācijas)
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=
FalseTrue, use_safetensors=True).to(device)
un
2/3
un
3/3
izsaucot
10 minūtes vēlāk:
