Created
January 20, 2017 00:12
-
-
Save dderevjanik/f1783cddf540f089dc601f75e31300c6 to your computer and use it in GitHub Desktop.
[Python] scrapping website in one line
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import requests, re, sys | |
| [[ [open('jamal.txt', 'w+', encoding='utf-8').write("\n".join(re.findall(r, txt, re.I|re.S|re.U))) for r in ['b>ID:.*?b>(.*?)<b', 'b>.*?zverejnenia:.*?span.*?>(.*?)<\/span', 'b>Lokalita:.*?a.*?>(.*?)<\/', 'b>Poz.cia(?:(.*?))<\/div', 'b>Spo.*?:.*?<a.*?">(.*?)<\/a']] for txt in ((requests.get('http://www.PROC.sk/' + offer, headers={'User-agent': 'Mozilla/5.0'}).text) for offer in page)] for page in (re.findall('itemscope.*?href="(.*?)"', requests.get('http://www.PROC.sk/praca/?page_num=' + str(n), headers={'User-agent': 'Mozilla/5.0'}).text) for n in range(int(re.findall('page_num=(.*?)"', requests.get('http://www.PROC.sk/praca/', headers={'User-agent': 'Mozilla/5.0'}).text)[-2])))] | |
| # thanks to python generators. Without them, I won't be able to write this ugly code. | |
| # mini-contest |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment