带有div标签且没有属性的BeautifulSoup

时间:2018-03-12 05:02:23

标签: python beautifulsoup

尝试从网页上抓取数据:

html中会有多个结果,寻找使用find_all检索div和span标签中项目的最有效方法, 我唯一可以使每个条目不同的是/results?phoneno=999999999&rid=0x0

它将有一个rid = 0x0 rid = 0x1等..不确定如何抓住下面列出的所有这些元素

<div class="card-summary" data-detail="/results?phoneno=999999999&amp;rid=0x0">
    <div class="row">
        <div class="col-md-8">
            <div class="h4">Kevin Johnson</div>
            <div>
                 <span class="content-label">Age </span>
                 <span class="content-value">54 </span>
            </div>
            <div>
                 <span class="content-label">Lives in </span>
                 <span class="content-value">Las Vegas, NV</span>
            </div>
        </div>
    </div>
</div>
<div class="card-summary" data-detail="/results?phoneno=6666666666&amp;rid=0x02">
    <div class="row">
        <div class="col-md-8">
            <div class="h4">Amy Smith</div>
            <div>
                <span class="content-label">Age </span>
                <span class="content-value">25 </span>
            </div>
            <div>
                <span class="content-label">Lives in </span>
                <span class="content-value">New York, NY</span>
            </div>
        </div>
    </div>
</div>

即:["Kevin Johnson", "54", "Las Vegas, NV", "/results?phoneno=999999999&amp;rid=0x0"]

将每个人列入列表然后输出打印 比如data = [["Name","Age","Location","URL"]]

1 个答案:

答案 0 :(得分:0)

您可以使用nameagecontactlives_in的键为每个人创建字典。找到每个人的这些详细信息,然后将这些词典附加到列表中。

代码:

soup = BeautifulSoup(html, 'lxml')
information = []
for person in soup.find_all('div', class_='card-summary'):
    person_info = {}
    person_info['contact'] = person['data-detail']
    person_info['name'] = person.find('div', class_='h4').text
    person_info['age'] = person.find('span', text='Age ').find_next('span').text
    person_info['location'] = person.find('span', text='Lives in ').find_next('span').text
    information.append(person_info)

print(information)

输出:

[{'age': '54 ',
  'contact': '/results?phoneno=999999999&rid=0x0',
  'location': 'Las Vegas, NV',
  'name': 'Kevin Johnson'},
 {'age': '25 ',
  'contact': '/results?phoneno=6666666666&rid=0x02',
  'location': 'New York, NY',
  'name': 'Amy Smith'}]

如果您想要列表中的信息,可以使用以下代码:

soup = BeautifulSoup(html, 'lxml')
information = []
for person in soup.find_all('div', class_='card-summary'):
    contact = person['data-detail']
    name = person.find('div', class_='h4').text
    age = person.find('span', text='Age ').find_next('span').text
    location = person.find('span', text='Lives in ').find_next('span').text
    information.append([name, age, location, contact])

print(information)

输出:

[['Kevin Johnson', '54 ', 'Las Vegas, NV', '/results?phoneno=999999999&rid=0x0'], ['Amy Smith', '25 ', 'New York, NY', '/results?phoneno=6666666666&rid=0x02']]