在Python中使用BeautifulSoup模块,我正在尝试解析下面的这个网页。
<div class="span-body"><div class="timestamp updated" title="2016-05-08T1231Z">May 8, 12:31 PM EDT</div></div>
我正在尝试让下面的脚本返回2016-05-08T1231Z
,该timestamp updated
位于带有with open("index.html", 'rb') as source_file:
soup = BeautifulSoup(source_file.read()) # Read the source file and get BeautifulSoup to work with it.
div_1 = soup.find("div", {"class": "span-body"}).contents[0] # Parse the first div.
div_2 = div_1("div", {"class": "timestamp updated"}) # Parse the second div.
print div_2
类的第二个div中。
div_1
div_2
返回我想要返回的内容(第二个div),但df['companyId'] = df['companyId'].astype('str') # because type was 'object'.
df['companyId'].map(lambda x: int(x[4:]))
不是,而是只返回给我一个空列表。
如何解决此问题?
答案 0 :(得分:0)
有两个选项,您只需删除contents[0]
:
div_1 = soup.find("div", {"class": "span-body"}) # Parse the first div.
div_2 = div_1("div", {"class": "timestamp updated"})
这将返回一个包含一个元素的列表:
[<div class="timestamp updated" title="2016-05-08T1231Z">May 8, 12:31 PM EDT</div>]
只需使用find()
:
div_1 = soup.find("div", {"class": "span-body"})
div_2 = div_1.find("div", {'class': 'timestamp updated'})
print(div_2)
结果:
<div class="timestamp updated" title="2016-05-08T1231Z">May 8, 12:31 PM EDT</div>
如果您不需要中级div_1
,为什么不直接进入div_2
?
div_2 = soup.find("div", {'class': 'timestamp updated'})
从评论中编辑:要获取title属性的值,您可以将其编入索引:
div_2['title']
答案 1 :(得分:0)
要从div_1
找到你想要的内容,你需要再次使用find函数,你也可以删除contents[0]
,因为find没有返回列表。
soup = BeautifulSoup(source_file.read()) # Read the source file and get BeautifulSoup to work with it.
div_1 = soup.find("div", {"class": "span-body"}) # Parse the first div.
div_2 = div_1.find("div", {"class": "timestamp updated"}) # Parse the second div.
print div_2