Python BeautifulSoup findAll不返回所有元素吗?

时间:2018-09-21 15:17:32

标签: python html beautifulsoup

我正在尝试从此URL https://99airdrops.com/page/1/中提取数据。

我写的代码如下。

import requests
from bs4 import BeautifulSoup

url_str = 'https://99airdrops.com/page/1/'

page = requests.get(url_str, headers={'User-Agent': 'Mozilla Firefox'})

# soup = BeautifulSoup(page.text, 'lxml')
soup = BeautifulSoup(page.text, 'html.parser')

# print(soup.prettify())

print(len(soup.findAll('div')))

print(soup.find('div', class_='title'))

我的问题是,print(len(soup.findAll('div')))行仅返回23,而print(soup.find('div', class_='title'))行则显示None。即使有多个实例,find命令也找不到带有class_='title'的div元素,并且div元素深深地嵌套在html页面中,但这从来没有引起我问题。

我尝试使用lxmlhtml.parser,但是都没有返回所有div元素。我也尝试将html写入文件,读取文件,然后使用它运行BeautifulSoup,但是得到了相同的结果。有人可以告诉我这里是什么问题吗?

我还尝试了Beautiful Soup - `findAll` not capturing all tags in SVG (`ElementTree` does)处的建议来更新我的lxml软件包,但仍然遇到相同的问题。

BeautifulSoup doesn't find correctly parsed elements,我也尝试过解决方案,但是没有运气。

1 个答案:

答案 0 :(得分:2)

似乎您只需一次请求就可以获取所需的所有数据。

>>> import requests
>>> r = requests.get('https://cdn.99airdrops.com/static/airdrops.json')
>>> data = r.json()
>>> len(data)
133

例如:

>>> import json; print(json.dumps(data.popitem(), indent=2))
[
  "pointium",
  {
    "unique": "pointium",
    "name": "Pointium",
    "currency": "PNT",
    "description": "Global Decentralized Platform for Point Management & Loyalty Program",
    "instructions": "<ol><li>Join Telegram <a href=\"https://t.me/pointium\" target=\"_blank\">@Pointium</a> and click \"Join Airdrop\" (+500 PNT) </li><li>Enter your e-mail (+200 PNT) </li><li><a href=\"https://twitter.com/POINTIUM_ICO\" target=\"_blank\">Follow Twitter</a> and submit your username (+500 PNT) </li><li>Confirm your details</li></ol>",
    "rating": "7.30",
    "addDate": "2018-04-20 06:23:03",
    "expirationDate": "2018-05-07",
    "startDate": "2018-04-07",
    "image": "https://cdn.99airdrops.com/static/pointium.jpeg",
    "joinLink": "https://www.pointium.org/airdrop",
    "sponsored": "0",
    "status": "0",
    "startDateFormatted": "7th of April",
    "expirationDateFormatted": "7th of May",
    "attributes": {
      "bitcointalk": "0",
      "category": "airdrop",
      "email": "1",
      "facebook": "0",
      "kyc": "0",
      "news": "https://twitter.com/POINTIUM_ICO",
      "opinion": "O parere personala este ca merge acest sistem foarte bine. Doar ca mai avem de lucrat la el sa fie bomba!",
      "other": "0",
      "phone": "0",
      "ratingConcept": "7",
      "ratingTeam": "5.5",
      "ratingWebsite": "7",
      "ratingWhitepaper": "8",
      "reddit": "0",
      "telegram": "1",
      "tokenGiven": "1200",
      "tokenPrice": "0.007",
      "tokenSupply": "1,600,000,000",
      "tokenType": "ERC20",
      "twitter": "1",
      "website": "www.pointium.org"
    }
  }
]