我正在尝试从以下HTML中获取日期,“ Di 10.Dez 2019”
soup = `<div aria-disabled="false" aria-label="Di 10. Dez. 2019" aria-selected="false" class="DayPicker-Day" role="gridcell" tabindex="-1">\n <div class="DayPicker-Day-Inner">\n <span class="DayPicker-Day-Date">\n 10\n </span>\n <span class="DayPicker-Day-Price">\n 56\n </span>\n <span class="DayPicker-Day-Currency">\n CHF\n </span>\n </div>\n</div>\n`
到目前为止,我尝试了类似的方法:
soup.find(lambda tag: tag.name == 'aria-disabled="false".aria-label=' in tag.get_text())
这只会返回一个非值。
我不知道。你们谁能帮忙吗?谢谢!
答案 0 :(得分:0)
尝试一下:
txt = """<div aria-disabled="false" aria-label="Di 10. Dez. 2019"
aria-selected="false" class="DayPicker-Day" role="gridcell" tabindex="-1">\n
<div class="DayPicker-Day-Inner">\n <span class="DayPicker-Day-Date">\n 10\n
</span>\n <span class="DayPicker-Day-Price">\n 56\n </span>\n
<span class="DayPicker-Day-Currency">\n CHF\n </span>\n
</div>\n</div>\n"""
from bs4 import BeautifulSoup as bs
soup = bs(txt, "lxml")
dat = soup.select_one("div[aria-label]")
dat.attrs['aria-label']
输出:
'Di 10.Dez。 2019'