我将原始HTML文件保存在列表中,并希望使用BeautifulSoup函数get.text()
从列表中的每个元素中提取文本。
是否可以使用get.text()
函数遍历列表?
尝试此操作时出现错误:
TypeError:预期的字符串或类似字节的对象
有没有办法做到这一点?
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
import itertools
from collections.abc import Iterable
import pymssql
conn = pymssql.connect(
host='x',
port=x,
user='x',
password='x',
database='x'
)
cursor = conn.cursor()
cursor.execute('SELECT x FROM x')
text = [r[0] for r in cursor.fetchall() ]
conn.close()
conn = pymssql.connect(
host='x',
port=x,
user='x',
password='x',
database='x'
)
cursor = conn.cursor()
cursor.execute('SELECT x FROM x')
t = [r[0] for r in cursor.fetchall() ]
conn.close()
for line in text:
soup = bs(text, 'html.parser')
for script in soup(["script", "style"]):
script.extract()
autor = soup.get_text()
s = autor.replace('\\n', '')
答案 0 :(得分:0)
您要将整个文本列表传递给BeautifulSoup。
尝试:
for line in text:
soup = bs(line, 'html.parser')
然后,您可能想在此for循环中添加其余代码,以便针对我理解为整个html字符串的每条“行”运行它。