您好我正试图通过网络搜索本次选举季的CNN主要结果,并用它做一些机器学习。在研究了一下之后,我正在使用Python 3.5,我看到我可以使用lxml和BeautifulSoup以及执行它的请求。在使用BeautifulSoup失败后(我尝试使用XPath但它没有拿起它),我尝试使用lxml。在爱荷华州的主页(以及迄今为止的每个州),CNN根据县和每位候选人的投票百分比将其分解。在查看html页面后,我看到每个县名都被存储,以便县名是div标签后面的h2标签的一部分(以及类属性),依此类推每个县。因此,我使用CSSSelector来尝试捕获(因为h2总是在一个县的div之后)。 html部分如下所示:
<div class="race-results__county-header race-results__county-name section-header__column" data-reactid=".0.4.3.0.0.0.0.$0.0.$0">
<h2 class="section-heading" data-reactid=".0.4.3.0.0.0.0.$0.0.$0.0">Adair</h2>
</div>
代码如下:
from lxml import html
import requests
page = requests.get('http://www.cnn.com/election/primaries/counties/ia/Rep').text
doc = html.fromstring(page)
link = doc.cssselect("div h2")
print(link)
然而,当我尝试打印链接时,绝对没有任何东西(只是一个空数组[])。这是html如何布局,代码或解析器的问题?我正在使用JetBeans的PyCharm,但我不认为这与它有任何关系。我对这些东西很新,所以任何其他方法都会非常感激。
答案 0 :(得分:0)
问题是,该页面不包含您期望的结果,因为它们可能是通过JavaScript呈现的。
当我从给定网址下载内容时,没有<h2>
元素,但我发现了一条消息:请启用JavaScript查看CNN 2016年选举中心。
您没有收到数据,因为它们不在页面上。
不要因为浏览器可能会向您显示<h2>
元素而感到困惑,因为JavaScript已将其放入其中。
提示:检查,页面加载的是哪些JSON文件。很可能,某些文件将为您的任务提供随时可用的数据。在我的网络浏览器中使用F12(并在之后刷新页面)我看到了许多JSON文件,其中一些提供了有关候选人的数据。
E.g。 url:http://data.cnn.com/ELECTION/2016primary/candidates/can1187.json返回以下内容(缩写):
{
"candidateInfo": {
"id": 1187,
"fname": "Mike",
"lname": "Huckabee",
"party": "Rep",
"rd": "1",
"pd": "0",
"td": "1",
"d_nom": 1237,
"inrace": true,
"nominee": false,
"rd_k": "1460",
"td_k": 2472,
"dpct": 0,
"dpct_nom": 50,
"states": [
{
"state": "Alabama",
"code": "AL",
"electiondate": "20160301",
"primarytype": "primary",
"candidates": []
},
{
"state": "Alaska",
"code": "AK",
"electiondate": "20160301",
"primarytype": "caucus",
"candidates": []
},
{
"state": "Arizona",
"code": "AZ",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "Arkansas",
"code": "AR",
"electiondate": "20160301",
"primarytype": "primary",
"candidates": []
},
{
"state": "Iowa",
"code": "IA",
"electiondate": "20160201",
"primarytype": "caucus",
"candidates": [
{
"id": 1187,
"rd": "1",
"pd": "0",
"td": "1",
"winner": false
}
]
},
{
"state": "Kansas",
"code": "KS",
"electiondate": "20160305",
"primarytype": "caucus",
"candidates": []
},
{
"state": "Kentucky",
"code": "KY",
"electiondate": "20160305",
"primarytype": "caucus",
"candidates": []
},
{
"state": "Louisiana",
"code": "LA",
"electiondate": "20160305",
"primarytype": "primary",
"candidates": []
},
{
"state": "Maine",
"code": "ME",
"electiondate": "20160305",
"primarytype": "caucus",
"candidates": []
},
{
"state": "Maryland",
"code": "MD",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "Massachusetts",
"code": "MA",
"electiondate": "20160301",
"primarytype": "primary",
"candidates": []
},
{
"state": "Michigan",
"code": "MI",
"electiondate": "20160308",
"primarytype": "primary",
"candidates": []
},
{
"state": "Minnesota",
"code": "MN",
"electiondate": "20160301",
"primarytype": "caucus",
"candidates": []
},
{
"state": "Mississippi",
"code": "MS",
"electiondate": "20160308",
"primarytype": "primary",
"candidates": []
},
{
"state": "Missouri",
"code": "MO",
"electiondate": "20160315",
"primarytype": "primary",
"candidates": []
},
{
"state": "Montana",
"code": "MT",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "Nebraska",
"code": "NE",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "Nevada",
"code": "NV",
"electiondate": "20160223",
"primarytype": "caucus",
"candidates": []
},
{
"state": "New Hampshire",
"code": "NH",
"electiondate": "20160209",
"primarytype": "primary",
"candidates": []
},
{
"state": "New Jersey",
"code": "NJ",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "New Mexico",
"code": "NM",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "New York",
"code": "NY",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "North Carolina",
"code": "NC",
"electiondate": "20160315",
"primarytype": "primary",
"candidates": []
},
{
"state": "North Dakota",
"code": "ND",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "Ohio",
"code": "OH",
"electiondate": "20160315",
"primarytype": "primary",
"candidates": []
},
{
"state": "Oklahoma",
"code": "OK",
"electiondate": "20160301",
"primarytype": "primary",
"candidates": []
},
{
"state": "Oregon",
"code": "OR",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "Virgin Islands",
"code": "VI",
"electiondate": "",
"primarytype": "",
"candidates": []
},
{
"state": "Northern Marianas",
"code": "MP",
"electiondate": "",
"primarytype": "",
"candidates": []
}
],
"races": [
{
"status": "called",
"code": "AR",
"state": "Arkansas",
"polltype": "exit",
"primarytype": "primary",
"cresults": true,
"cmap": true,
"xpoll": true,
"electiondate": "20160301",
"pctsrep": 100,
"ts": 1457130949809,
"racerank": 6,
"winner": false,
"vpct": 1,
"pctDecimal": "1.2",
"inc": false,
"votes": 4703,
"cvotes": "4,703",
"rd": "0",
"pd": "0",
"sd": "0",
"td": "0",
"position": 13
},
{
"status": "called",
"code": "GA",
"state": "Georgia",
"polltype": "exit",
"primarytype": "primary",
"cresults": true,
"cmap": true,
"xpoll": true,
"electiondate": "20160301",
"pctsrep": 92,
"ts": 1457130978961,
"racerank": 8,
"winner": false,
"vpct": 0,
"pctDecimal": "0.2",
"inc": false,
"votes": 2615,
"cvotes": "2,615",
"rd": "0",
"pd": "0",
"sd": "0",
"td": "0",
"position": 13
},
{
"status": "called",
"code": "TN",
"state": "Tennessee",
"polltype": "exit",
"primarytype": "primary",
"cresults": true,
"cmap": true,
"xpoll": true,
"electiondate": "20160301",
"pctsrep": 100,
"ts": 1457131086792,
"racerank": 7,
"winner": false,
"vpct": 0,
"pctDecimal": "0.3",
"inc": false,
"votes": 2404,
"cvotes": "2,404",
"rd": "0",
"pd": "0",
"sd": "0",
"td": "0",
"position": 15
},
{
"status": "called",
"code": "IA",
"state": "Iowa",
"polltype": "entrance",
"primarytype": "caucus",
"cresults": true,
"cmap": true,
"xpoll": true,
"electiondate": "20160201",
"pctsrep": 99,
"ts": 1454997428611,
"racerank": 9,
"winner": false,
"vpct": 2,
"pctDecimal": "1.8",
"inc": false,
"votes": 3345,
"cvotes": "3,345",
"rd": "1",
"pd": "0",
"sd": "1",
"td": "1",
"position": 14
},
{
"status": "called",
"code": "AL",
"state": "Alabama",
"polltype": "exit",
"primarytype": "primary",
"cresults": true,
"cmap": true,
"xpoll": true,
"electiondate": "20160301",
"pctsrep": 100,
"ts": 1456958822650,
"racerank": 8,
"winner": false,
"vpct": 0,
"pctDecimal": "0.3",
"inc": false,
"votes": 2535,
"cvotes": "2,535",
"rd": "0",
"pd": "0",
"sd": "0",
"td": "0",
"position": 13
}
],
"lts": 1458233488340
}
}