Skip to content

Instantly share code, notes, and snippets.

@chandradeoarya
Created August 13, 2023 14:28
Show Gist options
  • Select an option

  • Save chandradeoarya/decc3b1b342b8f67414a53cbad518449 to your computer and use it in GitHub Desktop.

Select an option

Save chandradeoarya/decc3b1b342b8f67414a53cbad518449 to your computer and use it in GitHub Desktop.
Saving scrapped blogspot blogs as CSV. Scrapper is https://github.com/sudosuwinter/blogger-scrapper/
import csv
from bs4 import BeautifulSoup
def extract_first_image_url(html_string):
soup = BeautifulSoup(html_string, 'html.parser')
img_tag = soup.find('img')
if img_tag:
img_url = img_tag.get('src')
return img_url
else:
return None
def remove_spaces_and_quote(input_string):
cleaned_string = input_string.strip().replace("'", '')
csv_filename = "filename.csv"
with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
csv_writer = csv.writer(csvfile)
# Write header row
csv_writer.writerow(['post_image', 'post_title', 'post_content', 'published_date'])
for article in articles:
img = extract_first_image_url(article.content)
title = article.title
content = article.content
published_date = article.published_date.strftime('%Y%m%d%H%M%S')
csv_writer.writerow([img, title, content, published_date])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment