我正在尝试使用beautifulsoup废弃一个网站,我的问题是我只是想在Html源代码中获取一个链接,但最终会得到一个可怕的列表
<div class="table-list-cell py-3 pl-3 v-align-middle member-avatar-cell css-truncate pr-0">
<a href="/Member1">
<img alt="@Member1" class="avatar float-left" height="48" src="https://avatars0.githubusercontent.com/u/xxxxxxx" width="48" />
</a>
我只想让/ Member1或@ Member1我的代码看起来像这样:
Membres={}
response = requests.get('https://github.com/orgs/xxxxxxxx/people?page=1')
soup = BeautifulSoup(response.content, "html.parser")
for e in soup.find_all("div",{"class":"table-list-cell py-3 pl-3 v-align-middle member-avatar-cell css-truncate pr-0"}):
for d in e.find_all("a"):
for f in d.find_all("img alt="):
Membres[f]={}
因此,我试图在f中切断线路。&#39;并创建一个直接链接,例如:
for d in e.find_all("a", href=True):
如果某人有办法获得Member1名称,我的密钥中仍会有很多信息。
谢谢
答案 0 :(得分:1)
您可以尝试使用简单的列表解析来从.gridresidencial {
margin-top: calc(1.25% + 180px);
width:101.5%;
padding:0;
}
@media only screen and (min-width: 1930px) {
.img-list { float: none;columns: 4;}}
@media only screen and (min-width: 1285px) and (max-width:1930px) {
.img-list { float: none;columns: 3;}}
@media only screen and (min-width: 750px) and (max-width:1285px) {
.img-list { float: none; columns: 2;}}
.img-list {
margin: 0 auto;
text-align: center;
padding:0;
list-style-type: none;
width:100%;
-webkit-column-gap: 0px; /* Chrome, Safari, Opera */
-moz-column-gap: 0px; /* Firefox */
column-gap: 0px;
}
li {
display: inline-block;
vertical-align: top;
text-align: center;
padding: 0px;
margin: 0 auto;
float:none;
position: relative;
}
li figure {
padding: 5px;
margin: 0 auto;
width:100%;
}
.img-list img{
float: right;
max-height:560px;
max-width:1280px;
margin:0;
width:645px;
height: 285px;
}
@media only screen and (min-width: 1930px) {
span.text-content {
float: center;
columns: 4;
margin-left:15px;
opacity: 0;
}
span.text-content {
color: #FFFFFF;
cursor: pointer;
display: table;
height: 320px;
margin-top:50px;
position: absolute;
top: 0;
width: 100%;
}
span.text-content span {
display: table-cell;
text-align: center;
vertical-align: middle;
font-size:40px;
font-family: "AktivGroteskStdBd";
line-height: 0.6;
}
h2 {text-align: center;}
ul:after {
display: table;
clear: both;
content: '';
}
标记中提取href
:
<a>
给出:
for e in soup.find_all("div",{"class":"table-list-cell py-3 pl-3 v-align-middle member-avatar-cell css-truncate pr-0"}):
my_list = [a['href'] for a in e.find_all('a')]
要将它们放入字典中,您可以使用类似的语法:
>>> my_list
['/Member1']
给出:
for e in soup.find_all("div",{"class":"table-list-cell py-3 pl-3 v-align-middle member-avatar-cell css-truncate pr-0"}):
my_dict = {a['href']:'' for a in e.find_all('a')}
答案 1 :(得分:1)
您可以使用正则表达式:
import re
s = """
<div class="table-list-cell py-3 pl-3 v-align-middle member-avatar-cell css-truncate pr-0">
<a href="/Member1">
<img alt="@Member1" class="avatar float-left" height="48" src="https://avatars0.githubusercontent.com/u/xxxxxxx" width="48" />
</a>
"""
user_data = dict(re.findall('<img alt="@(.*?)" class="avatar float-left" height="48" src="(.*?)" width="48" />', s))
输出:
{'Member1': 'https://avatars0.githubusercontent.com/u/xxxxxxx'}