使用bs4将多个列表合并为一个组织的csv

时间:2019-10-30 02:24:44

标签: python html csv web-scraping beautifulsoup

我对此并不陌生,我将其作为学习的机会,但由于社区的帮助才走得很远。但是,我试图抓取这样的多个部分页面

https://m.the-numbers.com/movie/Black-Panther

特别是summarystarring castsupporting cast

我已经成功地向csv写入了1个列表,但是似乎找不到写多个列表的方法。我正在寻找一种可扩展的解决方案,可以在其中继续向导出添加更多列表。

我尝试过的事情:

将它们放在单独的列表中,例如details, actors,与details.extended使用相同的列表,依此类推。

预期结果正在生成一个表格,例如:

标题:     title, amount,starName,StarCharacter

下面列出数据。

错误:     Exception has occurred: AttributeError'str' object has no attribute 'keys'

from bs4 import BeautifulSoup
import csv
import re

# Making get request
r = requests.get('https://m.the-numbers.com/movie/Black-Panther')

# Creating BeautifulSoup object
soup = BeautifulSoup(r.text, 'lxml')

# Localizing table from the BS object
table_soup = soup.find('div', class_='row').find('div', class_='table-responsive').find('table', id='movie_finances')
website = 'https://m.the-numbers.com/'
details = []
# Iterating through all trs in the table except the first(header) and the last two(summary) rows 

for tr in table_soup.find_all('tr')[2:4]:
    tds = tr.find_all('td')

    # Creating dict for each row and appending it to the details list
    details.extend({
        'title': tds[0].text.strip(),
        'amount': tds[1].text.strip(),
    })


cast_soup = soup.find('div', id='accordion').find('div', class_='cast_new').find('table', class_='table table-sm')
for tr in cast_soup.find_all('tr')[2:15]:
    tdc = tr.find_all('td')

    # Creating dict for each row and appending it to the details list
    details.append({
        'starName': tdc[0].text.strip(),
        'starCharacter': tdc[1].text.strip(),
    })

# Writing details list of dicts to file using csv.DictWriter
with open('moviesPage2018.csv', 'w', encoding='utf-8', newline='\n') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=details[0].keys())
    writer.writeheader()
    writer.writerows(details)```


0 个答案:

没有答案