我要剪贴的html部分如下:
<ul . . .> #has some attributes represented by dots
<li . . .>
<div class="c1">
<h4 class="c2">T1</h4>
<h5 class="c3">T2</h5>
<p class="c4">T3</p>
<p class="c5">T4</p>
</div>
</li>
<li . . .>
<div class="c1">
<h4 class="c2">T1</h4>
<h5 class="c3">T2</h5>
<p class="c4">T3</p>
<p class="c5">T4</p>
</div>
</li>
<li> . . .</li>
. . .
. . .
. . . # dots show repetition
</ul>
现在,我想从h4
类h5
中p
,p
,div
,c1
的文本中{ {1}}个列表。我使用硒解决了这个问题,如下所示:
li
现在在beautifulsoup中,我使用了以下代码,但由于它会打印出空列表,所以它不起作用:
pare = driver.find_elements_by_xpath("//div[@class='c1']")
for par in pare:
title = par.find_element_by_xpath("./h4[@class='c2']")
manu = par.find_element_by_xpath("./h5[@class='c3']")
desc = par.find_element_by_xpath("./p[@class='c4']")
tit.append(title.text)
man.append(manu.text)
des.append(desc.text)
列表打印为空。由于我是beautifulsoup的新手,请帮助我解决此问题。
答案 0 :(得分:0)
您可以尝试:
my_divs = soup.find_all('div', {'c1'})
titles = [h for div in my_divs for h in div.find_all("h4")]
manufacturers = [h for div in my_divs for h in div.find_all("h5")]
descriptions = [p for div in my_divs for p in div.find_all("p")]
prices = [p for div in my_divs for p in soup.find_all("p")]
man = [item.get_text(strip=True) for item in manufacturers]
tit = [item.get_text(strip=True) for item in titles]
des = [item.get_text(strip=True) for item in descriptions]
pri = [item.get_text(strip=True) for item in prices]