我试图通过在python中使用BeautifulSoup从URL获取一些数据,但是当我运行最后一条命令时,
import pandas as pd
# initialize an empty df
data = pd.DataFrame()
# populate data frame with entries
data['name'] = ['Joe Smith', 'Mary James', 'Charles Williams']
data['school'] = ["Jollywood Secondary", "Northgate Sixth From", "Brompton High"]
data['subjects'] = [['Maths', 'Art', 'Biology'], ['English', 'French', 'History'], ['Chemistry', 'Biology', 'English']]
# use dictionary comprehensions to set up main dictionary and sub-dictionary templates
# sub-dictionary
keys = ['name', 'school', 'subjects']
record = {key: None for key in keys}
# main dictionary
keys2 = ['cand1', 'cand2', 'cand3']
candidates = {key: record for key in keys2}
# as a result i get something like this
# {'cand1': {'name': None, 'school': None, 'subjects': None},
# 'cand2': {'name': None, 'school': None, 'subjects': None},
# 'cand3': {'name': None, 'school': None, 'subjects': None}}
# iterate through main dictionary and populate each sub-dict with row of df
for i, d in enumerate(candidates.items()):
d[1]['name'] = data['name'].iloc[i]
d[1]['school'] = data['school'].iloc[i]
d[1]['subjcts'] = data['subjects'].iloc[i]
# what i end up with is the last row entry in each sub-dictionary
#{'cand1': {'name': 'Charles Williams',
# 'school': 'Brompton High',
# 'subjects': None,
# 'subjcts': ['Chemistry', 'Biology', 'English']},
# 'cand2': {'name': 'Charles Williams',
# 'school': 'Brompton High',
# 'subjects': None,
# 'subjcts': ['Chemistry', 'Biology', 'English']},
# 'cand3': {'name': 'Charles Williams',
# 'school': 'Brompton High',
# 'subjects': None,
# 'subjcts': ['Chemistry', 'Biology', 'English']}}
我始终收到此错误,告诉我'LXMLTreeBuilder'对象没有属性soup = BeautifulSoup(content)
我该怎么解决这个问题?
这是我的代码:
'DEFAULT_NSMAPS_INVERTED'
答案 0 :(得分:0)
您导入了requests
,所以请使用它...以这种方式尝试:
url = 'https://www.ucf.edu/'
page = requests.get(url)
soup = BeautifulSoup(page.content)
答案 1 :(得分:0)
您没有在BeautifulSoup
构造函数中指定解析器。尝试将html.parser
放在此处:
import urllib.request as urllib2
from bs4 import BeautifulSoup
import requests
url = 'https://www.ucf.edu/'
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content, 'html.parser') # <-- specify parser here
print(soup.prettify())
编辑:确保已安装BeautifulSoup
的最新版本(可选的lxml
的最新版本)。我使用的是beautifulsoup4==4.8.0
和lxml==4.3.4
版