我想基于另一列匹配/映射数据框中的缺失值。例如,
City State Country
Chicago IL United States
Boston MA United States
San Diego
Los Angeles CA United States
San Francisco
Sacramento
Vancouver BC Canada
所以,如果我想填写洛杉矶这三个城市的省份和国家的空单元格。我该怎么办?
以下是我的代码,但我完全陷入其中。
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df.loc[df['City'] == CA_cities, 'State' = 'CA' and 'Country' = 'United States']
非常感谢任何帮助。
答案 0 :(得分:3)
您可以将groupby
与isin
创建的掩码一起使用,然后通过后退和前进填充替换NaN
:
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df = df.groupby(df['City'].isin(CA_cities)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Boston MA United States
2 San Diego CA United States
3 Los Angeles CA United States
4 San Francisco CA United States
5 Sacramento CA United States
6 Vancouver BC Canada
更一般的解决方案是创建城市群,例如在词典中,交换keys
wih值和map
列:
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 NaN NaN
2 Boston MA United States
3 San Diego NaN NaN
4 Los Angeles CA United States
5 San Francisco NaN NaN
6 Sacramento NaN NaN
7 Vancouver BC Canada
cities = {'CA': ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento'],
'IL':['Chicago','Chicago1']}
d = {k: oldk for oldk, oldv in cities.items() for k in oldv}
df = df.groupby(df['City'].map(d).fillna(df['City'])).apply(lambda x: x.ffill().bfill())
#slowier alternative
#df = df.groupby(df['City'].replace(d)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 IL United States
2 Boston MA United States
3 San Diego CA United States
4 Los Angeles CA United States
5 San Francisco CA United States
6 Sacramento CA United States
7 Vancouver BC Canada
<强>详细强>:
print (df['City'].map(d).fillna(df['City']))
0 IL
1 IL
2 Boston
3 CA
4 CA
5 CA
6 CA
7 Vancouver
Name: City, dtype: object
print (d)
{'San Diego': 'CA', 'Los Angeles': 'CA', 'San Francisco': 'CA',
'Sacramento': 'CA', 'Chicago': 'IL', 'Chicago1': 'IL'}
答案 1 :(得分:3)
Or just split it , and using fillna
.
CA_cities = ['SanDiego', 'LosAngeles', 'SanFrancisco', 'Sacramento']
s=df.loc[df.City.isin(CA_cities),:]
t=df.loc[~df.City.isin(CA_cities),:]
pd.concat([s.fillna({'State':'CA','Country':'UnitedStates'}),t])
Out[1023]:
City State Country
2 SanDiego CA UnitedStates
3 LosAngeles CA UnitedStates
4 SanFrancisco CA UnitedStates
5 Sacramento CA UnitedStates
0 Chicago IL UnitedStates
1 Boston MA UnitedStates
6 Vancouver BC Canada