我是一名设计研究员。我有几个.txt文件,其中包含75-100个引用,我已经给出了各种标签,如下所示:
<q 69_A F exercises positive> Well I think it’s very good. I thought that the exercises that Rosy did was very good. I looked at it a few times. I listened and I paid attention but I didn’t really do it on the regular. I didn’t do the exercises on a regular basis. </q>
我试图通过使用beautifulsoup尝试列出所有标签(“69_a”“练习”“积极”)。但不是给我一个看起来像这样的输出:
69_a
exercises
positive
它给我一个看起来像这样的输出:
q
q
q
q
Finished...
你可以帮我解决这个问题吗?我有很多定性数据,我想通过这个。目标是将所有引号导出到.xlsx文件并使用数据透视表进行排序。
from bs4 import BeautifulSoup
file_object = open('Angela_Q_2.txt', 'r')
soup = BeautifulSoup(file_object.read(), "lxml")
tag = soup.findAll('name')
for tag in soup.findAll(True):
print(tag.name)
print('Finished')
答案 0 :(得分:2)
您想要列出的内容称为属性而非标签。要访问标签属性,请使用.attr值。
如下所示使用:
from bs4 import BeautifulSoup
contents = '<q tag1 tag2>Quote1</q>dome other text<q tag1 tag3>quote2</q>'
soup = BeautifulSoup(contents)
for tag in soup.findAll('q'):
print(tag.attrs)
print(tag.contents)
print('Finished')