AttributeError:使用BeautifulSoup时,“ LXMLTreeBuilder”对象没有属性“ DEFAULT_NSMAPS_INVERTED”

时间:2019-07-31 19:31:17

标签: python web-scraping beautifulsoup

我试图通过在python中使用BeautifulSoup从URL获取一些数据,但是当我运行最后一条命令时,

import pandas as pd

# initialize an empty df
data = pd.DataFrame()

# populate data frame with entries
data['name'] = ['Joe Smith', 'Mary James', 'Charles Williams']
data['school'] =  ["Jollywood Secondary", "Northgate Sixth From", "Brompton High"]
data['subjects'] = [['Maths', 'Art', 'Biology'], ['English', 'French', 'History'], ['Chemistry', 'Biology', 'English']]

# use dictionary comprehensions to set up main dictionary and sub-dictionary templates

# sub-dictionary
keys = ['name', 'school', 'subjects']
record = {key: None for key in keys}

# main dictionary
keys2 = ['cand1', 'cand2', 'cand3']
candidates = {key: record for key in keys2}

# as a result i get something like this
# {'cand1': {'name': None, 'school': None, 'subjects': None},
# 'cand2': {'name': None, 'school': None, 'subjects': None},
# 'cand3': {'name': None, 'school': None, 'subjects': None}}

# iterate through main dictionary and populate each sub-dict with row of df
for i, d in enumerate(candidates.items()):

    d[1]['name'] = data['name'].iloc[i]
    d[1]['school'] = data['school'].iloc[i]
    d[1]['subjcts'] = data['subjects'].iloc[i]

# what i end up with is the last row entry in each sub-dictionary
#{'cand1': {'name': 'Charles Williams',
#  'school': 'Brompton High',
#  'subjects': None,
#  'subjcts': ['Chemistry', 'Biology', 'English']},
# 'cand2': {'name': 'Charles Williams',
#  'school': 'Brompton High',
#  'subjects': None,
#  'subjcts': ['Chemistry', 'Biology', 'English']},
# 'cand3': {'name': 'Charles Williams',
#  'school': 'Brompton High',
#  'subjects': None,
#  'subjcts': ['Chemistry', 'Biology', 'English']}}

我始终收到此错误,告诉我'LXMLTreeBuilder'对象没有属性soup = BeautifulSoup(content) 我该怎么解决这个问题?

这是我的代码:

'DEFAULT_NSMAPS_INVERTED'

2 个答案:

答案 0 :(得分:0)

您导入了requests,所以请使用它...以这种方式尝试:

url = 'https://www.ucf.edu/'
page = requests.get(url)
soup = BeautifulSoup(page.content)

答案 1 :(得分:0)

您没有在BeautifulSoup构造函数中指定解析器。尝试将html.parser放在此处:

import urllib.request as urllib2
from bs4 import BeautifulSoup
import requests
url = 'https://www.ucf.edu/'
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content, 'html.parser') # <-- specify parser here

print(soup.prettify())

编辑:确保已安装BeautifulSoup的最新版本(可选的lxml的最新版本)。我使用的是beautifulsoup4==4.8.0lxml==4.3.4