如何抓取这些数据,因为它们彼此不同?
<span id ='DataListTicker_lblTicker_0'>Wheat</span>
<span id ='DataListTicker_lblTicker_1'>Rice</span>
<span id ='DataListTicker_lblTicker_2'>Barleyt</span>
<span id ='DataListTicker_lblTicker_3'>Milk</span>
.
.
.
<span id ='DataListTicker_lblTicker_n'>XYZ</span>
我一次需要所有这些数据。 帮助我,首选语言是python。
答案 0 :(得分:0)
您可以使用HTMLParser和regex
执行此操作试一试。
from html.parser import HTMLParser
import re
html_to_parse = """<span id ='DataListTicker_lblTicker_0'>Wheat</span>
<span id ='DataListTicker_lblTicker_1'>Rice</span>
<span id ='DataListTicker_lblTicker_2'>Barleyt</span>
<span id ='DataListTicker_lblTicker_3'>Milk</span>
<span id ='DataListTicker_lblTicker_n'>XYZ</span>"""
class MyHTMLParser(HTMLParser):
def __init__(self):
super().__init__()
self.handle_next = False
def handle_starttag(self, tag, attrs):
if re.search('^DataListTicker_lblTicker_[0-9]*$', dict(attrs).get("id","")):
self.handle_next = True
def handle_data(self, data):
if self.handle_next :
print(data)
self.handle_next = False
ps = MyHTMLParser()
ps.feed(html_to_parse)
这可能是一种更优雅的方式,但这应该有效。
答案 1 :(得分:0)
尝试以下方法ID不同但它们之间有很多相似之处。这就是你如何从中获取数据的方法:
from bs4 import BeautifulSoup
element = """
<span id ='DataListTicker_lblTicker_0'>Wheat</span>
<span id ='DataListTicker_lblTicker_1'>Rice</span>
<span id ='DataListTicker_lblTicker_2'>Barleyt</span>
<span id ='DataListTicker_lblTicker_3'>Milk</span>
<span id ='DataListTicker_lblTicker_n'>XYZ</span>
"""
soup = BeautifulSoup(element,"lxml")
for items in soup.select("[id^='DataListTicker_lblTicker_']"):
print(items.text)
输出:
Wheat
Rice
Barleyt
Milk
XYZ