在尝试使用BS4刮取站点并将结果提供回.txt文件时遇到问题。我在BS4上阅读的所有内容都不仅仅涵盖如何从身体提取数据。
示例
`<html>
<head></head>
<body> ==$0
<pre style="word-wrap: break-word; white-space: pre-wrap;">
"; A
<B--need this data exported to my file>
C
D
E
<F--Also need this exported to my file>
end`
到目前为止,我已经测试过的代码仍然出现错误,并且还没有到达我需要从B行提取数据的部分。
`from bs4 import BeautifulSoup
import urllib.request
from bs4.element import comment
url = ('website')
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, features="html.parser"
for script in soup(["script", "style"]):
script.extract()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
body = soup.find(body)
body = soup.find(text="<word from site>")
outfile = open('file location','w')
outfile.write('body')`
由于不了解从B行提取数据的知识,还在学习Python,如果这有意义,请原谅我!