我下面有html代码
<div class = "conf">
Brazil vs. Colombia
</ div>
<div class = "targetHour"> 08:00 pm </ div>
</ div>
</ div>
<div class = "matches">
<div class = "conf">
Chilex Argentina
</ div>
<div class = "targetHour"> 08:00 pm </ div>
</ div>
</ div>
我需要获取父div的值和子div的值,而不必重复结果。将每个游戏的时间表与各自的父代联系起来。
这是我的代码pyhton
for nc in soup.find_all('div', attrs={'class': 'league-data'}):
campeonato = nc.text
for hr in soup.find('div', attrs={'class': 'match row cf'}).findAll("div",recursive=False):
print(campeonato + "|" + hr.text)
答案 0 :(得分:1)
您可以使用zip()
函数将匹配项与相应的时间表绑定:
from bs4 import BeautifulSoup
data = '''<div class = "conf">
Brazil vs. Colombia
</div>
<div class = "targetHour"> 08:00 pm </div>
</div>
</div>
<div class = "matches">
<div class = "conf">
Chilex Argentina
</div>
<div class = "targetHour"> 08:00 pm </div>
</div>
</div>'''
soup = BeautifulSoup(data, 'lxml')
for match, hour in zip( soup.select('div.conf'), soup.select('div.targetHour') ):
print(match.text.strip(), hour.text.strip())
打印:
Brazil vs. Colombia 08:00 pm
Chilex Argentina 08:00 pm
答案 1 :(得分:1)
另一种选择(假设长度列表为偶数)
from bs4 import BeautifulSoup
data = '''<div class = "conf">
Brazil vs. Colombia
</div>
<div class = "targetHour"> 08:00 pm </div>
</div>
</div>
<div class = "matches">
<div class = "conf">
Chilex Argentina
</div>
<div class = "targetHour"> 08:00 pm </div>
</div>
</div>'''
soup = BeautifulSoup(data, 'lxml')
items = [item.text.strip() for item in soup.select('.conf, .targetHour')]
for i in range(0, len(items), 2):
print(items[i],items[i+1])