我正在尝试将州名分配给大学名称列表:
df = pd.DataFrame({'College': pd.Series(['University of Michigan', 'University of Florida', 'Iowa State'])})
State = ['Michigan', 'Iowa']
df['State'] = np.where(df['College'].str.contains('|'.join(State)),
'state','--')
我想替换当与州的实际名称匹配时出现的“州”值。示例:密歇根大学 - >密歇根(而不是“州”)。最终,“州”将拥有所有50个州,因此我不能为每个州名写50个“np.where”语句。
感谢您的帮助。
答案 0 :(得分:3)
您可以在此使用str.extract
,而不是np.where
:
In [290]: df['State'] = df['College'].str.extract('({})'.format('|'.join(State)), expand=True)
In [291]: df
Out[291]:
College State
0 University of Michigan Michigan
1 University of Florida NaN
2 Iowa State Iowa
答案 1 :(得分:1)
States = [
'Washington' 'Wisconsin' 'West Virginia' 'Florida' 'Wyoming'
'New Hampshire' 'New Jersey' 'New Mexico' 'National' 'North Carolina'
'North Dakota' 'Nebraska' 'New York' 'Rhode Island' 'Nevada' 'Guam'
'Colorado' 'California' 'Georgia' 'Connecticut' 'Oklahoma' 'Ohio' 'Kansas'
'South Carolina' 'Kentucky' 'Oregon' 'South Dakota' 'Delaware'
'District of Columbia' 'Hawaii' 'Puerto Rico' 'Texas' 'Louisiana'
'Tennessee' 'Pennsylvania' 'Virginia' 'Virgin Islands' 'Alaska' 'Alabama'
'American Samoa' 'Arkansas' 'Vermont' 'Illinois' 'Indiana' 'Iowa'
'Arizona' 'Idaho' 'Maine' 'Maryland' 'Massachusetts' 'Utah' 'Missouri'
'Minnesota' 'Michigan' 'Montana' 'Northern Mariana Islands' 'Mississippi'
]
state_str = '|'.join(States)
df.update(df.College.str.extract(r'(?P<State>{})'.format(state_str), expand=True))
df