我有一个像这样的xml:
<link>
www.link1.com
</link>
<link>
www.link2.com
</link>
我试过这段代码:
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(results2) #Beautiful Soup
linklist = soup.findAll('link')
print soup
使用此代码,输出为
[<link>www.link1.com</link>,<link>www.link2.com</link>]
但我想要这样的输出
[www.link1.com, www.link2.com]
答案 0 :(得分:8)
你试过了吗?
linklist = [el.string for el in soup.findAll('link')]
答案 1 :(得分:1)
links = soup.find_all('link')
link_strings = [s.string for s in links.string]
答案 2 :(得分:1)
试试这个:
from bs4 import BeautifulSoup
xml = """<html><link>
www.link1.com
</link>
<link>
www.link2.com
</link></html>"""
soup = BeautifulSoup(xml,features="xml")
linklist = soup.find_all('link')
linklist = map(lambda x: x.string, linklist)
请注意,我已使用BeautifulSoup
而不是features="xml"
将构造函数更改为BeautifulStoneSoup
,因为后者已弃用。