-
-
Save sandipransing/282343cb15dc4b329f4927cfe242ea44 to your computer and use it in GitHub Desktop.
Ruby script to download all NCERT book PDFs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # NCERT books are excellent but being altered for political or other reasons | |
| # See: https://twitter.com/SouthAsiaIndex/status/1518062204058103809 | |
| # To download the entire current set, run this script with Ruby | |
| require 'httparty' | |
| source = HTTParty.get('https://ncert.nic.in/textbook.php').force_encoding("ISO-8859-1").encode("utf-8", replace: nil) | |
| # book names are like aeen1dd.zip | |
| # First letter tells the class number a to l is class 1 to class 12. m stands for class 11 and 12 combined | |
| # Second letter is the language the book is written in: e for English, h for Hindi, u for Urdu | |
| bookids = source.scan(/textbook.php\?[a-z]{4,4}\d/).uniq | |
| def download_book(book_name) | |
| puts "Downloading #{book_name}" | |
| File.open(book_name, "w") do |file| | |
| file.binmode | |
| HTTParty.get('https://ncert.nic.in/textbook/pdf/' + book_name, follow_redirects: true, stream_body: true) do |fragment| | |
| file.write(fragment) | |
| end | |
| end | |
| rescue | |
| end | |
| bookids.each do |bid| | |
| book_name = bid.gsub("textbook.php?", "") + "dd.zip" | |
| download_book(book_name) | |
| sleep(0.2) | |
| end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment