我从
中提取 I-want-ya 文字时遇到问题<div class="field">
<div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
<div class="input">I-want-ya</div>
</div>
洗礼到目前为止:
browser = robobrowser.RoboBrowser(parser='html.parser')
browser.open(url)
browser = browser.parsed
soup = BeautifulSoup(str(browser), 'html.parser')
parsed_value = soup.select('div.labelx + .input)
是否有机会获得 I-want-ya 值:
<div class="input">I-want-ya</div>
由具有class =“labelx”的标签div和具有属性title =“Group”的子a的兄弟姐妹?
答案 0 :(得分:1)
更新:现在占多个匹配
from bs4 import BeautifulSoup
s = '''<div class="field">
<div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
<div class="input">I-want-ya</div>
<div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
<div class="input">I-want-you-2</div>
</div>'''
soup = BeautifulSoup(s, 'html.parser')
divs = soup.find_all('div', attrs={'class': 'labelx'})
for div in divs:
try:
div.find('a', {'title': 'Group'})
print(div.findNext('div', {'class': 'input'}).text)
except:
print('No match.')
给出:
I-want-ya
I-want-you-2
答案 1 :(得分:0)
假设我理解正确:
div
。{/ li>的class
元素
text
。>>> HTML = '''\
... <div class="field">
... <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
... <div class="input">I-want-ya</div>
... </div>'''
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> first_sib_div = soup.find('div', attrs={'class': 'labelx'})
>>> first_sib_div.fetchNextSiblings()[0].text
'I-want-ya'
编辑:这就应该是它。
>>> HTML = '''\
... <div class="field">
... <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
... <div class="input">I-want-ya</div>
... </div>'''
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> first_div_link = soup.select('div.labelx > a[title="Group"]')[0]
>>> first_div_link.findParent().fetchNextSiblings()[0].text
'I-want-ya'
附录:在回答rahlf23的问题时添加。
>>> s = '''\
... <div class="field">
... <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
... <div class="input">I-want-ya</div>
... <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
... <div class="input">I-want-ya-too</div>
... </div>'''
>>> soup = bs4.BeautifulSoup(s, 'lxml')
>>> for item in soup.select('div.labelx > a[title="Group"]'):
... item.findParent().fetchNextSiblings()[0].text
...
'I-want-ya'
'I-want-ya-too'