尝试从网页上抓取数据:
html中会有多个结果,寻找使用find_all检索div和span标签中项目的最有效方法,
我唯一可以使每个条目不同的是/results?phoneno=999999999&rid=0x0
。
它将有一个rid = 0x0 rid = 0x1等..不确定如何抓住下面列出的所有这些元素
<div class="card-summary" data-detail="/results?phoneno=999999999&rid=0x0">
<div class="row">
<div class="col-md-8">
<div class="h4">Kevin Johnson</div>
<div>
<span class="content-label">Age </span>
<span class="content-value">54 </span>
</div>
<div>
<span class="content-label">Lives in </span>
<span class="content-value">Las Vegas, NV</span>
</div>
</div>
</div>
</div>
<div class="card-summary" data-detail="/results?phoneno=6666666666&rid=0x02">
<div class="row">
<div class="col-md-8">
<div class="h4">Amy Smith</div>
<div>
<span class="content-label">Age </span>
<span class="content-value">25 </span>
</div>
<div>
<span class="content-label">Lives in </span>
<span class="content-value">New York, NY</span>
</div>
</div>
</div>
</div>
即:["Kevin Johnson", "54", "Las Vegas, NV", "/results?phoneno=999999999&rid=0x0"]
将每个人列入列表然后输出打印
比如data = [["Name","Age","Location","URL"]]
答案 0 :(得分:0)
您可以使用name
,age
,contact
,lives_in
的键为每个人创建字典。找到每个人的这些详细信息,然后将这些词典附加到列表中。
代码:
soup = BeautifulSoup(html, 'lxml')
information = []
for person in soup.find_all('div', class_='card-summary'):
person_info = {}
person_info['contact'] = person['data-detail']
person_info['name'] = person.find('div', class_='h4').text
person_info['age'] = person.find('span', text='Age ').find_next('span').text
person_info['location'] = person.find('span', text='Lives in ').find_next('span').text
information.append(person_info)
print(information)
输出:
[{'age': '54 ',
'contact': '/results?phoneno=999999999&rid=0x0',
'location': 'Las Vegas, NV',
'name': 'Kevin Johnson'},
{'age': '25 ',
'contact': '/results?phoneno=6666666666&rid=0x02',
'location': 'New York, NY',
'name': 'Amy Smith'}]
如果您想要列表中的信息,可以使用以下代码:
soup = BeautifulSoup(html, 'lxml')
information = []
for person in soup.find_all('div', class_='card-summary'):
contact = person['data-detail']
name = person.find('div', class_='h4').text
age = person.find('span', text='Age ').find_next('span').text
location = person.find('span', text='Lives in ').find_next('span').text
information.append([name, age, location, contact])
print(information)
输出:
[['Kevin Johnson', '54 ', 'Las Vegas, NV', '/results?phoneno=999999999&rid=0x0'], ['Amy Smith', '25 ', 'New York, NY', '/results?phoneno=6666666666&rid=0x02']]