需要从美丽的汤中找到价值

时间:2019-12-02 07:14:55

标签: python beautifulsoup

这是下一页following page的HTML代码的一部分:

chmod -R 777 foldername or pathname

我想从中提取<div> <div class="sidebar-labeled-information"> <span> Economic skill: </span> <span> 10.646 </span> </div> <div class="sidebar-labeled-information"> <span> Strength: </span> <span> 2336 </span> </div> <div class="sidebar-labeled-information"> <span> Location: </span> <div> <a href="region.html?id=454"> Little Karoo <div class="xflagsSmall xflagsSmall-Argentina"> </div> </a> </div> </div> <div class="sidebar-labeled-information"> <span> Citizenship: </span> <div> <div class="xflagsSmall xflagsSmall-Poland"> </div> <small> <a href="pendingCitizenshipApplications.html"> change </a> </small> </div> </div> </div> 。我不知道如何将搜索范围缩小到region.html?id=454,因为有很多<a href="region.html?id=454">标签。

这是python代码:

<a href=>

这段代码的输出是:

session=session()
r = session.get('https://orange.e-sim.org/battle.html?id=5377',headers=headers,verify=False) 
soup = BeautifulSoup(r.text, 'html.parser')
div = soup.find_all('div',attrs={'class':'sidebar-labeled-information'})

但是我想要的输出是[<div class="sidebar-labeled-information" id="levelMission"> <span>Level:</span> <span>15</span> </div>, <div class="sidebar-labeled-information" id="currRankText"> <span>Rank:</span> <span>Colonel</span> </div>, <div class="sidebar-labeled-information"> <span>Economic skill:</span> <span>10.646</span> </div>, <div class="sidebar-labeled-information"> <span>Strength:</span> <span>2336</span> </div>, <div class="sidebar-labeled-information"> <span>Location:</span> <div> <a href="region.html?id=454">Little Karoo<div class="xflagsSmall xflagsSmall-Argentina"></div> </a> </div> </div>, <div class="sidebar-labeled-information"> <span>Citizenship:</span> <div> <div class="xflagsSmall xflagsSmall-Poland"></div> <small><a href="pendingCitizenshipApplications.html">change</a> </small> </div> </div>]

我要搜索的页面位于here,但是您需要有一个帐户才能查看该页面。

3 个答案:

答案 0 :(得分:1)

您可以基于href值进行查询:

pc-2@pc2-VirtualBox:~$ cqlsh
Connection error: ('Unable to connect to any servers', 
{'127.0.0.1': ProtocolError('Unexpected response during Connection 
setup: AttributeError("\'module\' object has no attribute 
\'decompress\'",)',)})

答案 1 :(得分:0)

soup = BeautifulSoup(html)
links = soup.findAll('a', href=True)

for link in links:
  href = link['href']
  url = urlparse(href)
  if url.path == "region.html":
     print (url.path + "?" + url.query)

这将打印region.html?id=454

答案 2 :(得分:0)

您可以尝试使用此类: xflagsSmall 并找到该元素的权限

element=soup.find("div",{"class": "xflagsSmall"})
parent_element=element.find_parent()
link=parent_element.attrs["href"]```