我有一个很大的Excel文件,在每个单元格中我都有各种HTML内容,其中包含数据库用户发表的注释。每个单元格中的内容是唯一的并且长度不同。我需要摆脱所有HTML语法/标签,以便我可以将此内容上传到数据库表。如何使用Python(或Java,如果没有Python的答案)来抓取这些数据?你能提供一个代码示例吗?
答案 0 :(得分:0)
在终端中pip install bs4
。然后你可以像python一样提取文本:
import bs4
for cell in [
'<html>The indicator lights on the control cabinet are to be replaced with 24Vdc LED\'s. 3 Red & 3 Green.</html>',
'<html><div> <span style=""FONT-SIZE: 18pt"">Close the Monthly LAD and Lanyard Work orders to show they were executed. </span></div>']:
print(bs4.BeautifulSoup(cell).text.strip())
结果:
The indicator lights on the control cabinet are to be replaced with 24Vdc LED's. 3 Red & 3 Green.
Close the Monthly LAD and Lanyard Work orders to show they were executed.