我正在尝试从https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population中提取项目数据。我正在尝试将来自前20个城市的数据纳入熊猫数据框,如下所示: 排名|城市|纬度|经度
这样一来,我可以在代码的后半部分提取坐标并计算所需的各种参数。到目前为止,这是我想出的,但是似乎失败了:
rank=[]
city=[]
state=[]
population_present=[]
population_past=[]
changepercent=[]
info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
for row in bs.find('table').find_all('tr'):
p = row.find_all('td')
for row in bs.find('table').find_all('tr'):
p= row.find_all('td')
if(len(p) > 0):
rank.append(p[0].text)
city.append(p[1].text)
latitude.append(p[2].text.rstrip('\n'))
答案 0 :(得分:1)
您可以通过python pandas
进行操作。请尝试以下代码。
import pandas as pd
import requests
from bs4 import BeautifulSoup
info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
table=bs.find_all('table',class_='wikitable')[1]
df=pd.read_html(str(table))[0]
#Get the first 20 records
df1=df.iloc[:20]
Rank=df1['2018rank'].values.tolist()
City=df1['City'].values.tolist()
#Get the location in list
locationlist=df1['Location'].values.tolist()
Latitude=[]
Longitude=[]
for val in locationlist:
val1=val.split("/")[-1]
Latitude.append(val1.split()[0])
Longitude.append(val1.split()[-1])
df2=pd.DataFrame({"Rank":Rank,"City":City,"Latitude":Latitude,"Longitude":Longitude})
print(df2)
输出:
City Latitude Longitude Rank
0 New York[d] 40.6635°N 73.9387°W 1
1 Los Angeles 34.0194°N 118.4108°W 2
2 Chicago 41.8376°N 87.6818°W 3
3 Houston[3] 29.7866°N 95.3909°W 4
4 Phoenix 33.5722°N 112.0901°W 5
5 Philadelphia[e] 40.0094°N 75.1333°W 6
6 San Antonio 29.4724°N 98.5251°W 7
7 San Diego 32.8153°N 117.1350°W 8
8 Dallas 32.7933°N 96.7665°W 9
9 San Jose 37.2967°N 121.8189°W 10
10 Austin 30.3039°N 97.7544°W 11
11 Jacksonville[f] 30.3369°N 81.6616°W 12
12 Fort Worth 32.7815°N 97.3467°W 13
13 Columbus 39.9852°N 82.9848°W 14
14 San Francisco[g] 37.7272°N 123.0322°W 15
15 Charlotte 35.2078°N 80.8310°W 16
16 Indianapolis[h] 39.7767°N 86.1459°W 17
17 Seattle 47.6205°N 122.3509°W 18
18 Denver[i] 39.7619°N 104.8811°W 19
19 Washington[j] 38.9041°N 77.0172°W 20
答案 1 :(得分:0)
您正在从网页中访问错误的元素。要使用所需数据访问表,请使用以下方法:
info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
for tr in bs.findAll('table')[4].findAll('tr'):
# Now take the data from this row that you want, and put it in a DataFrame