我只是在抓取数据,想输入两列标题和日期,但发生TypeError
TypeError:from_dict()得到了意外的关键字参数“ columns”
代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'
while True:
response=requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
content = soup.find_all('div',{'class': 'content'})
for contents in content:
title_tag = contents.find('span',{'class':'title'})
title= title_tag.text[1:-1] if title_tag else 'N/A'
date_tag = contents.find('span',{'class':'meta'})
date = date_tag.text if date_tag else 'N/A'
hiv={title : date}
print(' title : ', title ,' \n date : ' ,date )
url_tag = soup.find('div',{'class':'pagination'})
if url_tag.get('href'):
url = 'https://timesofindia.indiatimes.com/' + url_tag.get('href')
print(url)
else:
break
hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['title' ,'date'])
pandas已更新至0.23.4版本,然后还会发生错误。
答案 0 :(得分:1)
我注意到的第一件事是字典的结构已关闭。我假设您想要整个title:date的字典。您现在拥有的方式只会保留最后一个。
然后,执行此操作时,将带有的数据框的索引作为键,并且值是系列/列。因此,从技术上讲,只有1列。我可以通过重置索引来创建两列,然后将该索引放入我重命名'title'
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'
response=requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
content = soup.find_all('div',{'class': 'content'})
hiv = {}
for contents in content:
title_tag = contents.find('span',{'class':'title'})
title= title_tag.text[1:-1] if title_tag else 'N/A'
date_tag = contents.find('span',{'class':'meta'})
date = date_tag.text if date_tag else 'N/A'
hiv.update({title : date})
print(' title : ', title ,' \n date : ' ,date )
hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['date'])
hiv1 = hiv1.rename_axis('title').reset_index()
输出:
print (hiv1)
title date
0 I told my boyfriend I was HIV positive and thi... 01 Dec 2018
1 Pay attention to these 7 very common HIV sympt... 30 Nov 2018
2 Transfusion of HIV blood: Panel seeks time til... 2019-01-06T03:54:33Z
3 No. of pregnant women testing HIV+ dips; still... 01 Dec 2018
4 Busted:5 HIV AIDS myths 30 Nov 2018
5 Myths and taboos related to AIDS 01 Dec 2018
6 N/A N/A
7 Mumbai: Free HIV tests at six railway stations... 23 Nov 2018
8 HIV blood tranfusion: Tamil Nadu govt assures ... 2019-01-05T09:05:27Z
9 Autopsy performed on HIV+ve donor’s body at GRH 2019-01-03T07:45:03Z
10 Madras HC directs to videograph HIV+ve donor’s... 2019-01-01T01:23:34Z
11 HIV +ve Tamil Nadu teen who attempted suicide ... 2018-12-31T03:37:56Z
12 Another woman claims she got HIV-infected blood 2018-12-31T06:34:32Z
13 Another woman says she got HIV from donor blood 29 Dec 2018
14 HIV case: Five-member panel begins inquiry in ... 29 Dec 2018
15 Pregnant woman turns HIV positive after blood ... 26 Dec 2018
16 Pregnant woman contracts HIV after blood trans... 26 Dec 2018
17 Man attacks niece born with HIV for sleeping i... 16 Dec 2018
18 Health ministry implements HIV AIDS Act 2017: ... 11 Sep 2018
19 When meds don’t heal: HIV+ kids fight daily wa... 03 Sep 2018
我不太确定为什么会出现错误。由于您使用的是更新的熊猫,因此没有任何意义。也许卸载Pandas,然后重新点安装它?
否则,我想您可以只用两行就可以完成,并在转换为数据框后为列命名:
hiv1 = pd.DataFrame.from_dict(hiv, orient = 'index').reset_index()
hiv1.columns = ['title','date']