Question

在尝试使用BS4刮取站点并将结果提供回.txt文件时遇到问题。我在BS4上阅读的所有内容都不仅仅涵盖如何从身体提取数据。

示例

`<html>
  <head></head>
  <body> ==$0
    <pre style="word-wrap: break-word; white-space: pre-wrap;">
      "; A
      <B--need this data exported to my file>
       C
       D
       E
      <F--Also need this exported to my file>
   end`

这种格式反复重复，但是数据结构相同。

到目前为止，我已经测试过的代码仍然出现错误，并且还没有到达我需要从B行提取数据的部分。

`from bs4 import BeautifulSoup
import urllib.request
from bs4.element import comment

url = ('website')
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, features="html.parser"

for script in soup(["script", "style"]):
    script.extract() 
text = soup.get_text()

lines = (line.strip() for line in text.splitlines())

body = soup.find(body)

body = soup.find(text="<word from site>")
outfile = open('file location','w')
outfile.write('body')`

由于不了解从B行提取数据的知识，还在学习Python，如果这有意义，请原谅我！

使用Python 3和BS4在网页上查找文本并写入文件

0 个答案: