我正在尝试从此URL https://99airdrops.com/page/1/中提取数据。
我写的代码如下。
import requests
from bs4 import BeautifulSoup
url_str = 'https://99airdrops.com/page/1/'
page = requests.get(url_str, headers={'User-Agent': 'Mozilla Firefox'})
# soup = BeautifulSoup(page.text, 'lxml')
soup = BeautifulSoup(page.text, 'html.parser')
# print(soup.prettify())
print(len(soup.findAll('div')))
print(soup.find('div', class_='title'))
我的问题是,print(len(soup.findAll('div')))
行仅返回23,而print(soup.find('div', class_='title'))
行则显示None
。即使有多个实例,find命令也找不到带有class_='title'
的div元素,并且div元素深深地嵌套在html页面中,但这从来没有引起我问题。
我尝试使用lxml
和html.parser
,但是都没有返回所有div元素。我也尝试将html写入文件,读取文件,然后使用它运行BeautifulSoup,但是得到了相同的结果。有人可以告诉我这里是什么问题吗?
我还尝试了Beautiful Soup - `findAll` not capturing all tags in SVG (`ElementTree` does)处的建议来更新我的lxml软件包,但仍然遇到相同的问题。
在BeautifulSoup doesn't find correctly parsed elements,我也尝试过解决方案,但是没有运气。
答案 0 :(得分:2)
似乎您只需一次请求就可以获取所需的所有数据。
>>> import requests
>>> r = requests.get('https://cdn.99airdrops.com/static/airdrops.json')
>>> data = r.json()
>>> len(data)
133
例如:
>>> import json; print(json.dumps(data.popitem(), indent=2))
[
"pointium",
{
"unique": "pointium",
"name": "Pointium",
"currency": "PNT",
"description": "Global Decentralized Platform for Point Management & Loyalty Program",
"instructions": "<ol><li>Join Telegram <a href=\"https://t.me/pointium\" target=\"_blank\">@Pointium</a> and click \"Join Airdrop\" (+500 PNT) </li><li>Enter your e-mail (+200 PNT) </li><li><a href=\"https://twitter.com/POINTIUM_ICO\" target=\"_blank\">Follow Twitter</a> and submit your username (+500 PNT) </li><li>Confirm your details</li></ol>",
"rating": "7.30",
"addDate": "2018-04-20 06:23:03",
"expirationDate": "2018-05-07",
"startDate": "2018-04-07",
"image": "https://cdn.99airdrops.com/static/pointium.jpeg",
"joinLink": "https://www.pointium.org/airdrop",
"sponsored": "0",
"status": "0",
"startDateFormatted": "7th of April",
"expirationDateFormatted": "7th of May",
"attributes": {
"bitcointalk": "0",
"category": "airdrop",
"email": "1",
"facebook": "0",
"kyc": "0",
"news": "https://twitter.com/POINTIUM_ICO",
"opinion": "O parere personala este ca merge acest sistem foarte bine. Doar ca mai avem de lucrat la el sa fie bomba!",
"other": "0",
"phone": "0",
"ratingConcept": "7",
"ratingTeam": "5.5",
"ratingWebsite": "7",
"ratingWhitepaper": "8",
"reddit": "0",
"telegram": "1",
"tokenGiven": "1200",
"tokenPrice": "0.007",
"tokenSupply": "1,600,000,000",
"tokenType": "ERC20",
"twitter": "1",
"website": "www.pointium.org"
}
}
]