我正在尝试从elections.in抓取数据。三个表具有相同的类。以下是网站上的HTML
<h3 class="blmap">17th General (Lok Sabha) Election Results 2019 – State Wise</h3>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>State</th><th>Party</th><th>Number of Seats</th></tr></thead><tbody>
<tr><td>Andaman & Nicobar Islands</td><td>Indian National Congress</td><td>1</td></tr>
<tr><td>Andhra Pradesh</td><td>Yuvajana Sramika Rythu Congress Party</td><td>22</td></tr>
<tr><td>Andhra Pradesh</td><td>Telugu Desam</td><td>3</td></tr>
<tr><td>Arunachal Pradesh</td><td>Bharatiya Janata Party</td><td>2</td></tr>
<tr><td>Assam</td><td>Bharatiya Janata Party</td><td>9</td></tr>
<tr><td>Assam</td><td>Indian National Congress</td><td>3</td></tr>
<tr><td>Assam</td><td>All India United Democratic Front</td><td>1</td></tr>
我能够获取数据,看起来像这样,
StatePartyNumber of Seats
Andaman & Nicobar IslandsIndian National Congress1
Andhra PradeshYuvajana Sramika Rythu Congress Party22
Andhra PradeshTelugu Desam3
Arunachal PradeshBharatiya Janata Party2
AssamBharatiya Janata Party9
AssamIndian National Congress3
AssamAll India United Democratic Front1
AssamIndependent1
BiharBharatiya Janata Party17
我想要下面的输出,
State,Party,Number of Seats
Andaman & Nicobar Islands, Indian National Congress,1
Andhra Pradesh,Yuvajana Sramika Rythu Congress Party,22
或作为列表。
这行代码为我提供了以上输出
soup.find_all('table')[1].get_text()
这是我的代码Github
请建议如何实现
谢谢。
答案 0 :(得分:2)
如果您尝试解析<table>
标签,请选择熊猫.read_html()
。它为您完成了大部分繁重的工作。它将返回数据帧列表。您要引用的表是第3个表(索引位置为2)
import pandas as pd
url="http://www.elections.in/"
tables = pd.read_html(url)
输出:
print (tables[2].to_string())
State Party Number of Seats
0 Andaman & Nicobar Islands Indian National Congress 1
1 Andhra Pradesh Yuvajana Sramika Rythu Congress Party 22
2 Andhra Pradesh Telugu Desam 3
3 Arunachal Pradesh Bharatiya Janata Party 2
4 Assam Bharatiya Janata Party 9
5 Assam Indian National Congress 3
6 Assam All India United Democratic Front 1
7 Assam Independent 1
8 Bihar Bharatiya Janata Party 17
9 Bihar Janata Dal (United) 16
10 Bihar Lok Jan Shakti Party 6
11 Bihar Indian National Congress 1
12 Chandigarh Bharatiya Janata Party 1
13 Chhattisgarh Bharatiya Janata Party 9
14 Chhattisgarh Indian National Congress 2
15 Dadra & Nagar Haveli Independent 1
16 Daman & Diu Bharatiya Janata Party 1
17 Goa Bharatiya Janata Party 1
18 Goa Indian National Congress 1
19 Gujarat Bharatiya Janata Party 26
20 Haryana Bharatiya Janata Party 10
21 Himachal Pradesh Bharatiya Janata Party 4
22 Jammu & Kashmir Bharatiya Janata Party 3
23 Jammu & Kashmir Jammu & Kashmir National Conference 3
24 Jharkhand Bharatiya Janata Party 11
25 Jharkhand Ajsu Party 1
26 Jharkhand Indian National Congress 1
27 Jharkhand Jharkhand Mukti Morcha 1
28 Karnataka Bharatiya Janata Party 25
29 Karnataka Independent 1
30 Karnataka Indian National Congress 1
31 Karnataka Janata Dal (Secular) 1
32 Kerala Indian National Congress 15
33 Kerala Indian Union Muslim League 2
34 Kerala Communist Party Of India (Marxist) 1
35 Kerala Kerala Congress (M) 1
36 Kerala Revolutionary Socialist Party 1
37 Lakshadweep Nationalist Congress Party 1
38 Madhya Pradesh Bharatiya Janata Party 28
39 Madhya Pradesh Indian National Congress 1
40 Maharashtra Bharatiya Janata Party 23
41 Maharashtra Shivsena 18
42 Maharashtra Nationalist Congress Party 4
43 Maharashtra All India Majlis-E-Ittehadul Muslimeen 1
44 Maharashtra Independent 1
45 Maharashtra Indian National Congress 1
46 Manipur Bharatiya Janata Party 1
47 Manipur Naga Peoples Front 1
48 Meghalaya Indian National Congress 1
49 Meghalaya National People'S Party 1
50 Mizoram Mizo National Front 1
51 Nagaland Nationalist Democratic Progressive Party 1
52 NCT OF Delhi Bharatiya Janata Party 7
53 Odisha Biju Janata Dal 12
54 Odisha Bharatiya Janata Party 8
55 Odisha Indian National Congress 1
56 Puducherry Indian National Congress 1
57 Punjab Indian National Congress 8
58 Punjab Bharatiya Janata Party 2
59 Punjab Shiromani Akali Dal 2
60 Punjab Aam Aadmi Party 1
61 Rajasthan Bharatiya Janata Party 24
62 Rajasthan Rashtriya Loktantrik Party 1
63 Sikkim Sikkim Krantikari Morcha 1
64 Tamil Nadu Dravida Munnetra Kazhagam 23
65 Tamil Nadu Indian National Congress 8
66 Tamil Nadu Communist Party Of India 2
67 Tamil Nadu Communist Party Of India (Marxist) 2
68 Tamil Nadu All India Anna Dravida Munnetra Kazhagam 1
69 Tamil Nadu Indian Union Muslim League 1
70 Tamil Nadu Viduthalai Chiruthaigal Katchi 1
71 Telangana Telangana Rashtra Samithi 9
72 Telangana Bharatiya Janata Party 4
73 Telangana Indian National Congress 3
74 Telangana All India Majlis-E-Ittehadul Muslimeen 1
75 Tripura Bharatiya Janata Party 2
76 Uttar Pradesh Bharatiya Janata Party 62
77 Uttar Pradesh Bahujan Samaj Party 10
78 Uttar Pradesh Samajwadi Party 5
79 Uttar Pradesh Apna Dal (Soneylal) 2
80 Uttar Pradesh Indian National Congress 1
81 Uttarakhand Bharatiya Janata Party 5
82 West Bengal All India Trinamool Congress 22
83 West Bengal Bharatiya Janata Party 18
84 West Bengal Indian National Congress
2
要使用BeautifulSoup实现此目的,您必须遍历每一行(标记<tr>
),然后遍历每一行的每个数据单元格标记(<td>
),然后将其附加到列表或数据框,或者您想存储它的方式。
是这样的:
import requests
import os
from bs4 import BeautifulSoup
url="http://www.elections.in/"
r=requests.get(url).content
htmlDoc=r.decode("utf-8")
soup = BeautifulSoup(htmlDoc, 'html.parser')
table = soup.find_all('table')[2]
rows = table.find_all('tr')
headers = table.find_all('th')
headers = [ each.text for each in headers ]
list_of_rows = []
for row in rows:
data = row.find_all('td')
if data != []:
data = [ each.text for each in data ]
list_of_rows.append(data)
输出:
print (headers)
['State', 'Party', 'Number of Seats']
print (list_of_rows)
[['Andaman & Nicobar Islands', 'Indian National Congress', '1'], ['Andhra Pradesh', 'Yuvajana Sramika Rythu Congress Party', '22'], ['Andhra Pradesh', 'Telugu Desam', '3'], ['Arunachal Pradesh', 'Bharatiya Janata Party', '2'], ['Assam', 'Bharatiya Janata Party', '9'], ['Assam', 'Indian National Congress', '3'], ['Assam', 'All India United Democratic Front', '1'], ['Assam', 'Independent', '1'], ['Bihar', 'Bharatiya Janata Party', '17'], ['Bihar', 'Janata Dal (United)', '16'], ['Bihar', 'Lok Jan Shakti Party', '6'], ['Bihar', 'Indian National Congress', '1'], ['Chandigarh', 'Bharatiya Janata Party', '1'], ['Chhattisgarh', 'Bharatiya Janata Party', '9'], ['Chhattisgarh', 'Indian National Congress', '2'], ['Dadra & Nagar Haveli', 'Independent', '1'], ['Daman & Diu', 'Bharatiya Janata Party', '1'], ['Goa', 'Bharatiya Janata Party', '1'], ['Goa', 'Indian National Congress', '1'], ['Gujarat', 'Bharatiya Janata Party', '26'], ['Haryana', 'Bharatiya Janata Party', '10'], ['Himachal Pradesh', 'Bharatiya Janata Party', '4'], ['Jammu & Kashmir', 'Bharatiya Janata Party', '3'], ['Jammu & Kashmir', 'Jammu & Kashmir National Conference', '3'], ['Jharkhand', 'Bharatiya Janata Party', '11'], ['Jharkhand', 'Ajsu Party', '1'], ['Jharkhand', 'Indian National Congress', '1'], ['Jharkhand', 'Jharkhand Mukti Morcha', '1'], ['Karnataka', 'Bharatiya Janata Party', '25'], ['Karnataka', 'Independent', '1'], ['Karnataka', 'Indian National Congress', '1'], ['Karnataka', 'Janata Dal (Secular)', '1'], ['Kerala', 'Indian National Congress', '15'], ['Kerala', 'Indian Union Muslim League', '2'], ['Kerala', 'Communist Party Of India (Marxist)', '1'], ['Kerala', 'Kerala Congress (M)', '1'], ['Kerala', 'Revolutionary Socialist Party', '1'], ['Lakshadweep', 'Nationalist Congress Party', '1'], ['Madhya Pradesh', 'Bharatiya Janata Party', '28'], ['Madhya Pradesh', 'Indian National Congress', '1'], ['Maharashtra', 'Bharatiya Janata Party', '23'], ['Maharashtra', 'Shivsena', '18'], ['Maharashtra', 'Nationalist Congress Party', '4'], ['Maharashtra', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Maharashtra', 'Independent', '1'], ['Maharashtra', 'Indian National Congress', '1'], ['Manipur', 'Bharatiya Janata Party', '1'], ['Manipur', 'Naga Peoples Front', '1'], ['Meghalaya', 'Indian National Congress', '1'], ['Meghalaya', "National People'S Party", '1'], ['Mizoram', 'Mizo National Front', '1'], ['Nagaland', 'Nationalist Democratic Progressive Party', '1'], ['NCT OF Delhi', 'Bharatiya Janata Party', '7'], ['Odisha', 'Biju Janata Dal', '12'], ['Odisha', 'Bharatiya Janata Party', '8'], ['Odisha', 'Indian National Congress', '1'], ['Puducherry', 'Indian National Congress', '1'], ['Punjab', 'Indian National Congress', '8'], ['Punjab', 'Bharatiya Janata Party', '2'], ['Punjab', 'Shiromani Akali Dal', '2'], ['Punjab', 'Aam Aadmi Party', '1'], ['Rajasthan', 'Bharatiya Janata Party', '24'], ['Rajasthan', 'Rashtriya Loktantrik Party', '1'], ['Sikkim', 'Sikkim Krantikari Morcha', '1'], ['Tamil Nadu', 'Dravida Munnetra Kazhagam', '23'], ['Tamil Nadu', 'Indian National Congress', '8'], ['Tamil Nadu', 'Communist Party Of India', '2'], ['Tamil Nadu', 'Communist Party Of India (Marxist)', '2'], ['Tamil Nadu', 'All India Anna Dravida Munnetra Kazhagam', '1'], ['Tamil Nadu', 'Indian Union Muslim League', '1'], ['Tamil Nadu', 'Viduthalai Chiruthaigal Katchi', '1'], ['Telangana', 'Telangana Rashtra Samithi', '9'], ['Telangana', 'Bharatiya Janata Party', '4'], ['Telangana', 'Indian National Congress', '3'], ['Telangana', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Tripura', 'Bharatiya Janata Party', '2'], ['Uttar Pradesh', 'Bharatiya Janata Party', '62'], ['Uttar Pradesh', 'Bahujan Samaj Party', '10'], ['Uttar Pradesh', 'Samajwadi Party', '5'], ['Uttar Pradesh', 'Apna Dal (Soneylal)', '2'], ['Uttar Pradesh', 'Indian National Congress', '1'], ['Uttarakhand', 'Bharatiya Janata Party', '5'], ['West Bengal', 'All India Trinamool Congress', '22'], ['West Bengal', 'Bharatiya Janata Party', '18'], ['West Bengal', 'Indian National Congress', '2']]
但是就像我说的那样,大熊猫会用.read_html()
答案 1 :(得分:1)
BeautifulSoup
解决方案略短:
from bs4 import BeautifulSoup as soup
d = soup(content, 'html.parser')
headers, data = [i.text for i in d.find_all('th')], [[i.text for i in b.find_all('td')] for b in d.find_all('tr')[1:]]
输出:
['State', 'Party', 'Number of Seats']
[['Andaman & Nicobar Islands', 'Indian National Congress', '1'], ['Andhra Pradesh', 'Yuvajana Sramika Rythu Congress Party', '22'], ['Andhra Pradesh', 'Telugu Desam', '3'], ['Arunachal Pradesh', 'Bharatiya Janata Party', '2'], ['Assam', 'Bharatiya Janata Party', '9'], ['Assam', 'Indian National Congress', '3'], ['Assam', 'All India United Democratic Front', '1']]
要写入csv
:
import csv
with open('election_results.csv', 'w') as f:
write = csv.writer(f)
write.writerows([headers, *data])
输出:
State,Party,Number of Seats
Andaman & Nicobar Islands,Indian National Congress,1
Andhra Pradesh,Yuvajana Sramika Rythu Congress Party,22
Andhra Pradesh,Telugu Desam,3
Arunachal Pradesh,Bharatiya Janata Party,2
Assam,Bharatiya Janata Party,9
Assam,Indian National Congress,3
Assam,All India United Democratic Front,1