Question

我将原始HTML文件保存在列表中，并希望使用BeautifulSoup函数get.text()从列表中的每个元素中提取文本。

是否可以使用get.text()函数遍历列表？

尝试此操作时出现错误：

TypeError：预期的字符串或类似字节的对象

有没有办法做到这一点？

from bs4 import BeautifulSoup as bs
import re
import pandas as pd
import itertools 
from collections.abc import Iterable
import pymssql



conn = pymssql.connect(
    host='x',
    port=x,
    user='x',
    password='x',
    database='x'
)
cursor = conn.cursor() 
cursor.execute('SELECT x FROM x')

text = [r[0] for r in cursor.fetchall() ]

conn.close()


conn = pymssql.connect(
    host='x',
    port=x,
    user='x',
    password='x',
    database='x'
)
cursor = conn.cursor() 
cursor.execute('SELECT x FROM x')

t = [r[0] for r in cursor.fetchall() ]

conn.close()


for line in text:
    soup = bs(text, 'html.parser')

for script in soup(["script", "style"]):
    script.extract() 

autor = soup.get_text()

s = autor.replace('\\n', '')

Answer 1

您要将整个文本列表传递给BeautifulSoup。

尝试：

for line in text:
    soup = bs(line, 'html.parser')

然后，您可能想在此for循环中添加其余代码，以便针对我理解为整个html字符串的每条“行”运行它。

是否可以使用BeautifulSoup遍历python列表？

1 个答案: