是否可以使用BeautifulSoup遍历python列表?

时间:2019-10-21 14:47:25

标签: python beautifulsoup

我将原始HTML文件保存在列表中,并希望使用BeautifulSoup函数get.text()从列表中的每个元素中提取文本。

是否可以使用get.text()函数遍历列表?

尝试此操作时出现错误:

TypeError:预期的字符串或类似字节的对象

有没有办法做到这一点?

from bs4 import BeautifulSoup as bs
import re
import pandas as pd
import itertools 
from collections.abc import Iterable
import pymssql



conn = pymssql.connect(
    host='x',
    port=x,
    user='x',
    password='x',
    database='x'
)
cursor = conn.cursor() 
cursor.execute('SELECT x FROM x')

text = [r[0] for r in cursor.fetchall() ]

conn.close()


conn = pymssql.connect(
    host='x',
    port=x,
    user='x',
    password='x',
    database='x'
)
cursor = conn.cursor() 
cursor.execute('SELECT x FROM x')

t = [r[0] for r in cursor.fetchall() ]

conn.close()


for line in text:
    soup = bs(text, 'html.parser')

for script in soup(["script", "style"]):
    script.extract() 

autor = soup.get_text()

s = autor.replace('\\n', '')

1 个答案:

答案 0 :(得分:0)

您要将整个文本列表传递给BeautifulSoup。

尝试:

for line in text:
    soup = bs(line, 'html.parser')

然后,您可能想在此for循环中添加其余代码,以便针对我理解为整个html字符串的每条“行”运行它。