我正在查看CSSEGISandData(github.com/CSSEGISandData/COVID-19.git)中的冠状病毒数据集。我正在尝试创建一个情节总体图,以显示每个美国县的病例数。
这是CSSEGISandData中的csv数据集的示例。我已将多天串联到一个文件中:
Province/State Country/Region Last Update Confirmed \
259 Chicago US 2020-01-24 17:00:00 1.0
3028 Orange, CA US 2020-02-01 19:53:00 1.0
2445 San Benito, CA US 2020-02-03 03:53:02 2.0
3181 San Antonio, TX US 2020-02-13 18:53:02 1.0
4762 Humboldt County, CA US 2020-02-21 05:13:09 1.0
Deaths Recovered Latitude Longitude file \
259 0.0 0.0 NaN NaN 01-24-2020.csv
3028 0.0 0.0 NaN NaN 02-01-2020.csv
2445 0.0 0.0 NaN NaN 02-24-2020.csv
3181 0.0 0.0 29.4241 -98.4936 03-04-2020.csv
4762 0.0 0.0 NaN NaN 02-27-2020.csv
我想修改此示例(https://plot.ly/python/mapbox-county-choropleth/)并使用数据框中的县,为此,我首先需要:
我在这里(https://github.com/kjhealy/fips-codes)找到了fips代码列表:
fips name state
0 0 UNITED STATES NaN
1 1000 ALABAMA NaN
2 1001 Autauga County AL
3 1003 Baldwin County AL
4 1005 Barbour County AL
如何使用正确的fips代码在熊猫数据框中创建新列?
这是我的代码,用于导入COVID数据和fips代码
!git clone https://github.com/CSSEGISandData/COVID-19.git
#@title Import and Option to show print more data
import pandas as pd
import glob
#Get the coronavirus data for the US
path = r'/content/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports' # use your path
all_files = glob.glob(path + "/*.csv") #collect all files in one
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
df = pd.concat(li, axis=0, ignore_index=True, sort=False) #one dataframe
filter_USA=frame['Country/Region']=='US'
USA= frame[filter_USA]
print (USA.head())
#Get the county data for the US with fips
county_url= 'https://raw.githubusercontent.com/kjhealy/fips-codes/master/state_and_county_fips_master.csv'
county = pd.read_csv(county_url)
print ( county.head())
现在,我需要将美国省/州名与县名匹配,并指定fips值。
#somethin like
for all USA['Province/State'] match to county.name
USA['fips'] = county match fips value
在数据框中添加更长的版本,以显示名称的不同问题:
3027 Los Angeles, CA
2369 Santa Clara, CA
2003 San Benito, CA
2310 Madison, WI
2470 Seattle, WA
6175 Chicago, IL
2237 San Diego County, CA
2805 San Diego County, CA
1765 San Antonio, TX
3657 Humboldt County, CA
737 Santa Clara, CA
3629 San Diego County, CA
2468 Sacramento County, CA
1543 Ashland, NE
1549 Travis, CA
1560 Lackland, TX
420 Lackland, TX (From Diamond Princess)
410 Travis, CA (From Diamond Princess)
404 Omaha, NE (From Diamond Princess)
6436 Omaha, NE (From Diamond Princess)
4289 Lackland, TX (From Diamond Princess)
5047 Travis, CA (From Diamond Princess)
2421 Unassigned Location (From Diamond Princess)
1769 Tempe, AZ
303 Unassigned Location (From Diamond Princess)
4981 Unassigned Location (From Diamond Princess)
5015 Sacramento County, CA
4208 Unassigned Location (From Diamond Princess)