Skip to content

Instantly share code, notes, and snippets.

@nker150
Created February 3, 2023 05:10
Show Gist options
  • Select an option

  • Save nker150/df88cb7ea519c0683e007995c7b9c3ab to your computer and use it in GitHub Desktop.

Select an option

Save nker150/df88cb7ea519c0683e007995c7b9c3ab to your computer and use it in GitHub Desktop.
How to archive YouTube channels

How to archive YouTube channels

This guide will show you how to download entire YouTube channels. I'm going to try to keep this as simple and straightforward as possible, and I'll also try to explain some of the why's here as well so that you can better understand how to modify this guide for your own personal use.

The first thing you're going to need is yt-dlp. Once upon a time we would have used youtube-dl, however the developer there didn't seem to keen on bypassing YouTube's throttling. yt-dlp is a fork of youtube-dl that performs far better on YouTube.

How you will get yt-dlp depends on which operating system you're using.

Installing on Windows:

There's numerous ways you could go about installing yt-dlp on Windows and I've explored most of them. The simplest, fastest, and most seamless way I've found seems to be via MSYS2, so that will be the first thing we'll want to download here. You can grab a copy at https://www.msys2.org/

The reason I use MSYS2 as a platform is because it gives you a POSIX compliant (Unix-like) environment within Windows, going so far as to even give you a package manager (pacman a la Arch Linux). This will make things super simple going forward.

After MSYS2 is installed, go ahead and run it, we specifically want to run "MSYS2 MINGW64". (You might want to create a shortcut to it on your desktop or something) Then once you get a terminal, we'll install yt-dlp and all it's dependencies with the following command:

pacman -S --noconfirm mingw-w64-x86_64-ffmpeg gcc python-pip && yes | pip install yt-dlp

That command will take several minutes to run, but after it does yt-dlp should be fully functional, and it will work as well as it does on Linux.

A couple more things to note, if you need to navigate to the root of your C: drive, just type cd c: and remember that your folder structure uses forward slashes in MSYS, so your command will look something like cd Documents\ and\ Settings/MSYS/Documents/. You can also navigate to other drives using the same scheme.

Also your home folder in MSYS is located in C:\msys64\home if you accidentally dropped some files in there.

Installing on Linux

Installing yt-dlp on Linux is almost trivial. The hard work is already done, you have a Unix environment already. All we need to do is install the packages.

For Debian/Ubuntu and distros based on it:

sudo apt install yt-dlp

For Fedora/Enterprise Linux run:

sudo dnf install yt-dlp

Pretty sure you can figure this out for other distros, basically substitute apt or dnf for whatever package manager your distro uses. Most distros seem to include yt-dlp in their repos by default.

Running yt-dlp to download an entire channel

Typically using yt-dlp is simple. To download a single video, we would run something like the following:

yt-dlp https://www.youtube.com/watch?v=dQw4w9WgXcQ

But that's not what we're after. We aim to make yt-dlp actually work for a living. So let me break down what we're going to do to accomplish that.

Of course we're going to start with yt-dlp. But we're going to add some options. The first option is --extractor-args youtube:player_client=android --throttled-rate 100K. This is one of the main improvements of yt-dlp over youtube-dl. This part of the command detects throttling by YouTube and bypasses it. This shouldn't be needed anymore, however I still add it because I don't trust what I can't see.

The next option we're going to use is --cookies ~/cookies.txt. This will create a cookies file in your home folder. What you can do with this option is export your cookies from Firefox using an extension called Export Cookies by Rotem Dan to a compatible cookies.txt file. That way for age restricted videos, you can sign into YouTube on Firefox, export your cookies to a text file, then use that signed in cookie with yt-dlp to download that video.

After that we're going to add --download-archive ./archive.txt. This will create a file in the download directory called archive.txt. That file is a simple text-based database of all the videos that are downloaded in that folder. The idea here is that yt-dlp will not bother to check any links that it has already downloaded. Although you can go without this, it will be far far slower as it will attempt to re-download videos it has already downloaded, just to find the file already exists. This is a massive time saver and you definitely want to take advantage of it.

--write-description does just that, it writes the description of the video to a text file ending in .description. The file is small, and it gives an overview of the video in the event you don't want the video or want to organize it later. It's something somewhat unique to YouTube that will likely be lost in most archives, you definitely want to snag a copy of that if you can.

Next in the command is --write-info-json and this will gather all the information about the YouTube video (such as tags, available resolutions, region locks, etc) and writes it to a standardized .json file that can be parsed by an external application. This is handy if you want to categorize these videos in an application later down the road.

--write-sub and --write-auto-sub are used to download the subtitles. --write-sub will only write manually created subtitle files, and --write-auto-sub will only write automatically generated subtitles. Both of these are extremely handy, as you can open these and get a gist of the video without opening the video. YouTube is the only major platform that I know of that uses AI to automatically generate subtitles for uploaded videos, and these are invaluable. Most other people's YouTube channel archives that I've seen exclude these.

Personally I find it preferable to archive the thumbnail as well, you do that with --write-thumbnail. No real reason not to do this, again it's extremely small compared to the actual video and it's usually something other people neglect to grab.

The option -i will simply ignore errors and keep ripping down videos if one download errors out. Omitting this will cause yt-dlp to just crash upon an error, which could be desirable in some circumstances I suppose. If time is of the essence (which if you're ripping down an entire channel, it probably is) then I would keep this option in. Better to have one half downloaded video in the entire channel than 1/3 of a channel.

Next we need to specify the format we want the video to be in. I choose MP4 videos when given the chance, and to do that we specify -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio'. VP8 and VP9 codecs don't produce as clear of video as H.264, and they also take more space. The vast majority of devices can play an MP4 with no issues whatsoever. And for you Free Software snobs out there (picking on myself more than anyone, chill) Cisco has open sourced their H.264 codecs so it's not really a closed standard anymore. Really no reason not to use an MP4.

-o '%(upload_date)s-%(title)s.%(ext)s' specifies how you want the file name to appear. This will produce a file name in a format like 20230102-Black Eyed Peas - Don't You Worry || Sylwester Marzeń 2022.mp4. If you want you can change this up, but I would recommend that you keep this format. It keeps all videos in chronological order and shows the date the video was published. The only reason I can think of that you wouldn't want to use this option is for downloading a TV series or something off of YouTube where the exact date is pretty irrelevant.

And then finally, we add our URL in the following format: https://www.youtube.com/@LukeSmithxyz. YouTube appears to have recently changed this for the better, channels didn't used to have an @ username and just had a bunch of garbage in the URL as a unique identifier. Specifying the channel itself will grab all the tabs on the channel, in the case of this channel we have "Home," "Videos," "Shorts," "Live," "Playlists," etc. All of these are categorized differently by YouTube. Shorts and Live streams are categorized differently than normal uploads, and if you grab just the URL of the Videos tab, you will be missing out on those. You want the entire channel.

So with all that in mind, your command will look like the following:

yt-dlp --extractor-args youtube:player_client=android --throttled-rate 100K --cookies ~/cookies.txt --download-archive ./archive.txt --write-description --write-info-json --write-annotations --write-sub --write-auto-sub --write-thumbnail -i -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio' -o '%(upload_date)s-%(title)s.%(ext)s' https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw

Keeping it Clean

Of course you're probably going to want to rip down more than one channel. How I would recommend that you go about this is creating a directory for each channel that you want to download. Here is a directory listing for the archive I'm working on as an example:

total 8
drwxrwxrwx 1 1026 users      0 Dec 31 11:37 AmericaUncensored
drwxrwxrwx 1 1026 users 198316 Feb  2 23:00 Canadian Prepper
drwxrwxrwx 1 1026 users     64 Dec 27 10:52 ChinaUncensored
drwxrwxrwx 1 suse suse   20558 Feb  2 20:17 ColDouglassMacgregor
drwxrwxrwx 1 1026 users   2548 Feb  2 20:17 CountryLifeVlog
drwxrwxrwx 1 1026 users     34 Dec  1 23:20 DAHBOO7
drwxrwxrwx 1 suse suse  680028 Feb  2 23:22 GeorgeGalloway
drwxrwxrwx 1 1026 users    164 Nov 11 20:19 Gonzalo Lira
drwxrwxrwx 1 suse suse      34 Feb  2 18:33 iEarlGrey
drwxrwxrwx 1 1024 users 210108 Feb  2 18:31 Iraqverteran8888
drwxrwxrwx 1 1026 users   3164 Feb  2 23:21 JohnDoyle
drwxrwxrwx 1 1026 users 369938 Feb  2 18:23 Matt Walsh
drwxrwxrwx 1 suse suse      34 Feb  2 18:34 MilitarySummaryChannel
drwxrwxrwx 1 1026 users  91688 Feb  2 18:23 OverlordDVD
drwxrwxrwx 1 1026 users  54594 Feb  2 23:13 PatriotNurse
drwxrwxrwx 1 1026 users  88958 Feb  2 18:23 PaulJosephWatson
drwxrwxrwx 1 1026 users     86 Nov 14 13:18 Pinball Preparedness
drwxrwxrwx 1 1026 users 135576 Feb  2 18:23 Poplar Preparedness
drwxrwxrwx 1 1026 users   4186 Feb  2 18:23 ReturnToTradition
drwxrwxrwx 1 1026 users  22302 Feb  2 18:24 SensusFidelium
drwxrwxrwx 1 suse suse      34 Feb  2 18:32 TheDuran
drwxrwxrwx 1 suse suse      34 Feb  2 18:37 TheGrayzone
drwxrwxrwx 1 1026 users   3786 Feb  2 18:24 TheModernMonarchist
drwxrwxrwx 1 suse suse      34 Feb  2 18:33 TheNewAtlas
drwxrwxrwx 1 1026 users  60436 Feb  2 18:24 TheQuartering
drwxrwxrwx 1 1026 users 108532 Feb  2 18:24 TraderUniversity
-rwxrwxrwx 1 suse suse     267 Feb  2 18:09 update.sh
drwxrwxrwx 1 suse suse      34 Feb  2 18:36 WeebUnion

Also in this directory there is a bash script called update.sh. This script will contain the following:

#!/bin/bash

function search_and_run() {
  for dir in ./*; do
    if [ -d "$dir" ]; then
      cd "$dir"
      if [ -f "$1" ]; then
        bash "$1"
      fi
      search_and_run "$1"
      cd ..
    fi
  done
}

search_and_run "update.sh"
search_and_run "clean.sh"

What this script will do is search each subdirectory and run the (different) update.sh script for each channel folder. After that it will start over, and run a script called clean.sh in each folder.

Now let's get to the subdirectories for the channels themselves. Let's use ./PatriotNurseas an example, I've snipped out the massive list of .mp4 files.

total 63702936
-rwxrwxrwx 1 1026 users  40532879 Aug 22  2014 20100503-The Role of Nutrition in Survival.mp4

...

-rwxrwxrwx 1 suse suse  173264824 Jan 27 20:25 20230127-Love Freedom?  Guess Who Deems You a Danger....mp4
-rwxrwxrwx 1 1026 users      9040 Jan 27 21:56 archive.txt
-rwxrwxrwx 1 1026 users        64 Jan  8 09:07 clean.sh
-rwxrwxrwx 1 suse suse          0 Feb  2 23:23 ls.txt
drwxrwxrwx 1 1026 users      1152 Feb  2 23:17 metadata
-rwxrwxrwx 1 1026 users  38186993 Jan 15 20:57 metadata.7z
-rwxrwxrwx 1 1026 users       384 Jan  7 18:26 update.sh

Inside ./metadata we have:

total 2520
-rwxrwxrwx 1 suse suse     774 Jan 27 21:56 20230127-Love Freedom?  Guess Who Deems You a Danger....description
-rw-r--r-- 1 suse suse  328929 Jan 27 21:56 20230127-Love Freedom?  Guess Who Deems You a Danger....info.json
-rwxrwxrwx 1 suse suse  213929 Jan 27 21:56 20230127-Love Freedom?  Guess Who Deems You a Danger....live_chat.json
-rwxrwxrwx 1 suse suse   75296 Jan 27 21:56 20230127-Love Freedom?  Guess Who Deems You a Danger....webp
-rwxrwxrwx 1 suse suse     930 Feb  2 18:16 NA-ThePatriotNurse.description
-rw-r--r-- 1 suse suse    7153 Feb  2 18:17 NA-ThePatriotNurse.info.json
-rwxrwxrwx 1 suse suse  631993 Feb  2 18:16 NA-ThePatriotNurse.jpg
-rwxrwxrwx 1 suse suse     930 Feb  2 18:17 NA-ThePatriotNurse - Shorts.description
-rw-r--r-- 1 suse suse    7096 Feb  2 18:17 NA-ThePatriotNurse - Shorts.info.json
-rwxrwxrwx 1 suse suse  631993 Feb  2 18:17 NA-ThePatriotNurse - Shorts.jpg
-rwxrwxrwx 1 suse suse     930 Feb  2 18:17 NA-ThePatriotNurse - Videos.description
-rw-r--r-- 1 suse suse    7098 Feb  2 18:17 NA-ThePatriotNurse - Videos.info.json
-rwxrwxrwx 1 suse suse  631993 Feb  2 18:17 NA-ThePatriotNurse - Videos.jpg

We have two scripts in ./PatriotNurse, update.sh and clean.sh.

Inside update.sh we have:

yt-dlp --extractor-args youtube:player_client=android --throttled-rate 100K --cookies ~/cookies.txt --download-archive ./archive.txt --write-description --write-info-json --write-annotations --write-sub --write-auto-sub --write-thumbnail -i -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio' -o '%(upload_date)s-%(title)s.%(ext)s' https://www.youtube.com/@ThePatriotNurse

And inside clean.sh we have:

mkdir metadata
mv *vtt *json *webp *description *jpg ./metadata

Fairly simple. Following this template you should be able to reproduce this setup. If you want to download a new channel, just create a folder for the channel, and create an update.sh and a clean.sh for it. Then from the parent directory just run that upadte.sh, and that's all there is to it. Keep an eye on your free disk space, it'll fill up fast.

@bjalborough
Copy link

Hi - I'm having trouble with FFmpeg having installed on Windows. I followed these instructions but it keeps separating MP4 and M4A files. Help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment