为什么我不能在使用python报废时获取字符串?

时间:2017-03-30 09:06:31

标签: python web-scraping beautifulsoup

这是我的代码我想从网站上删除一个单词列表, 但是当我在

上调用.string时
import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
entry_view = soup.find_all('div', {'class': 'entries'})
view = entry_view[0]
list = view.ul

for m in list:
    for x in m:
        title = x.string
        print(title)

我想要的是从网站上打印文本的列表,但我得到的是错误

Traceback (most recent call last):
  File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
    title = x.string
AttributeError: 'str' object has no attribute 'string'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
    title = x.string
AttributeError: 'str' object has no attribute 'string'

2 个答案:

答案 0 :(得分:3)

您可以使用以下代码来实现您的目标。

<强>代码:

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
    for e in elem.find_all('a'):
        entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

<强>输出:

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

在您的代码中

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

因此,正如您所看到的,变量x已经是一个字符串,因此使用bs4 method .string()是没有意义的。

p.s。:您不应该使用list之类的变量名称,它是一个保留的关键字。

答案 1 :(得分:1)

  

AttributeError:'str'对象没有属性'string'

这告诉你该对象已经是一个字符串。尝试删除它,它应该工作。

它还告诉您字符串数据类型的正确语法是str而不是string

另外一件事就是你使用title = str(x)转换,但因为在这种情况下它已经是一个字符串,所以它是多余的。

引用Google

  

Python有一个名为“str”的内置字符串类,有许多方便的功能(有一个名为“string”的旧模块,你不应该使用它)