Question

我试图通过在python中使用BeautifulSoup从URL获取一些数据，但是当我运行最后一条命令时，

import pandas as pd

# initialize an empty df
data = pd.DataFrame()

# populate data frame with entries
data['name'] = ['Joe Smith', 'Mary James', 'Charles Williams']
data['school'] =  ["Jollywood Secondary", "Northgate Sixth From", "Brompton High"]
data['subjects'] = [['Maths', 'Art', 'Biology'], ['English', 'French', 'History'], ['Chemistry', 'Biology', 'English']]

# use dictionary comprehensions to set up main dictionary and sub-dictionary templates

# sub-dictionary
keys = ['name', 'school', 'subjects']
record = {key: None for key in keys}

# main dictionary
keys2 = ['cand1', 'cand2', 'cand3']
candidates = {key: record for key in keys2}

# as a result i get something like this
# {'cand1': {'name': None, 'school': None, 'subjects': None},
# 'cand2': {'name': None, 'school': None, 'subjects': None},
# 'cand3': {'name': None, 'school': None, 'subjects': None}}

# iterate through main dictionary and populate each sub-dict with row of df
for i, d in enumerate(candidates.items()):

    d[1]['name'] = data['name'].iloc[i]
    d[1]['school'] = data['school'].iloc[i]
    d[1]['subjcts'] = data['subjects'].iloc[i]

# what i end up with is the last row entry in each sub-dictionary
#{'cand1': {'name': 'Charles Williams',
#  'school': 'Brompton High',
#  'subjects': None,
#  'subjcts': ['Chemistry', 'Biology', 'English']},
# 'cand2': {'name': 'Charles Williams',
#  'school': 'Brompton High',
#  'subjects': None,
#  'subjcts': ['Chemistry', 'Biology', 'English']},
# 'cand3': {'name': 'Charles Williams',
#  'school': 'Brompton High',
#  'subjects': None,
#  'subjcts': ['Chemistry', 'Biology', 'English']}}

我始终收到此错误，告诉我'LXMLTreeBuilder'对象没有属性soup = BeautifulSoup(content) 我该怎么解决这个问题？

这是我的代码：

'DEFAULT_NSMAPS_INVERTED'

Answer 1

您导入了requests，所以请使用它...以这种方式尝试：

url = 'https://www.ucf.edu/'
page = requests.get(url)
soup = BeautifulSoup(page.content)

Answer 2

您没有在BeautifulSoup构造函数中指定解析器。尝试将html.parser放在此处：

import urllib.request as urllib2
from bs4 import BeautifulSoup
import requests
url = 'https://www.ucf.edu/'
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content, 'html.parser') # <-- specify parser here

print(soup.prettify())

编辑：确保已安装BeautifulSoup的最新版本（可选的lxml的最新版本）。我使用的是beautifulsoup4==4.8.0和lxml==4.3.4版

AttributeError：使用BeautifulSoup时，“ LXMLTreeBuilder”对象没有属性“ DEFAULT_NSMAPS_INVERTED”

2 个答案: