这是下一页following page的HTML代码的一部分:
chmod -R 777 foldername or pathname
我想从中提取<div>
<div class="sidebar-labeled-information">
<span>
Economic skill:
</span>
<span>
10.646
</span>
</div>
<div class="sidebar-labeled-information">
<span>
Strength:
</span>
<span>
2336
</span>
</div>
<div class="sidebar-labeled-information">
<span>
Location:
</span>
<div>
<a href="region.html?id=454">
Little Karoo
<div class="xflagsSmall xflagsSmall-Argentina">
</div>
</a>
</div>
</div>
<div class="sidebar-labeled-information">
<span>
Citizenship:
</span>
<div>
<div class="xflagsSmall xflagsSmall-Poland">
</div>
<small>
<a href="pendingCitizenshipApplications.html">
change
</a>
</small>
</div>
</div>
</div>
。我不知道如何将搜索范围缩小到region.html?id=454
,因为有很多<a href="region.html?id=454">
标签。
这是python代码:
<a href=>
这段代码的输出是:
session=session()
r = session.get('https://orange.e-sim.org/battle.html?id=5377',headers=headers,verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
div = soup.find_all('div',attrs={'class':'sidebar-labeled-information'})
但是我想要的输出是[<div class="sidebar-labeled-information" id="levelMission">
<span>Level:</span> <span>15</span>
</div>, <div class="sidebar-labeled-information" id="currRankText">
<span>Rank:</span>
<span>Colonel</span>
</div>, <div class="sidebar-labeled-information">
<span>Economic skill:</span>
<span>10.646</span>
</div>, <div class="sidebar-labeled-information">
<span>Strength:</span>
<span>2336</span>
</div>, <div class="sidebar-labeled-information">
<span>Location:</span>
<div>
<a href="region.html?id=454">Little Karoo<div class="xflagsSmall xflagsSmall-Argentina"></div>
</a>
</div>
</div>, <div class="sidebar-labeled-information">
<span>Citizenship:</span>
<div>
<div class="xflagsSmall xflagsSmall-Poland"></div>
<small><a href="pendingCitizenshipApplications.html">change</a>
</small>
</div>
</div>]
。
我要搜索的页面位于here,但是您需要有一个帐户才能查看该页面。
答案 0 :(得分:1)
您可以基于href值进行查询:
pc-2@pc2-VirtualBox:~$ cqlsh
Connection error: ('Unable to connect to any servers',
{'127.0.0.1': ProtocolError('Unexpected response during Connection
setup: AttributeError("\'module\' object has no attribute
\'decompress\'",)',)})
答案 1 :(得分:0)
soup = BeautifulSoup(html)
links = soup.findAll('a', href=True)
for link in links:
href = link['href']
url = urlparse(href)
if url.path == "region.html":
print (url.path + "?" + url.query)
这将打印region.html?id=454
答案 2 :(得分:0)
您可以尝试使用此类:
xflagsSmall
并找到该元素的权限
element=soup.find("div",{"class": "xflagsSmall"})
parent_element=element.find_parent()
link=parent_element.attrs["href"]```