我正在尝试使用bs4抓取此IP地址。这里的ip是103.18.75.62
<div class="the-ip"><label id="a829266">1</label><label id="a814974">0</label><span id="a968168">3</span><label id="d735847">.</label><span id="d111988">1</span><span id="b284407">8</span><span id="b740896">.</span><label id="d817182">7</label><label id="e268019">5</label><span id="a721115">.</span><label id="e816439">6</label><span id="b903319">2</span></div>
我期待以下的工作
ip_div = soup.findAll('div' , class_ ='the-ip')
ips = ip[0].findAll('label' AND 'span') // how to implement this AND ???
for i in ips:
print i.get_text()
那么如何实现这个AND ???
答案 0 :(得分:1)
将select
与div.the-ip *
一起用作css选择器:
>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup('''
... <div class="the-ip">
... <label id="a829266">1</label>
... <label id="a814974">0</label>
... <span id="a968168">3</span>
... <label id="d735847">.</label>
... <span id="d111988">1</span>
... <span id="b284407">8</span>
... <span id="b740896">.</span>
... <label id="d817182">7</label>
... <label id="e268019">5</label>
... <span id="a721115">.</span>
... <label id="e816439">6</label>
... <span id="b903319">2</span>
... </div>
... ''')
>>> ''.join(el.text for el in soup.select('div.the-ip *'))
u'103.18.75.62'
我认为div.the-ip>*
(或div.the-ip>label, div.the-ip>span
)也应该有用。但这不适用于bs4。 (适用于lxml)
how to implement this AND
?您的意思是 OR ?
您可以传递已编译的正则表达式模式而不是字符串:
>>> import re
>>>
>>> ip_div = soup.find('div' , class_='the-ip') # `find`, not `findAll` here.
>>> ''.join(el.text for el in ip_div.findAll(re.compile('^(label|span)$')))
u'103.18.75.62'
^(label|span)$
匹配label
或span
。