Question

我正在尝试找到将HTML文档中的文本直接刮到.txt文件中的最佳方法。据我了解，这不能完全使用Javascript来完成，而只能在Node.js中完成。我还尝试了使用BeautifulSoup在Python中进行处理，但可能超出了我的水平。有问题的HTML文档使用了“ ng-bind”类，这些类似乎与我试图做的事情不太吻合。

我希望将HTML文档中的文本字符串直接拉到.txt文件中。

Answer 1

尝试使用此python代码，然后放入您需要从网站上抓取的自己的标签

import requests
import xlsxwriter 
from bs4 import BeautifulSoup

#Text File where the content will be written
file = open("test.txt","w")

#Url from where the data will be extracted
urls ="https://www.pythonforbeginners.com/files/reading-and-writing-files-in-python"
page = requests.get(urls)
soup = BeautifulSoup(page.content, 'html.parser')
for link in soup.find_all('p'): #extracting all content of <P> tag from the url
    #You can put the desired tag according to your need
 file.write(link.get_text())  
file.close()

从文档中抓取文本的最佳方法？

1 个答案: