我正在尝试用我创建的词典中的数据替换“位置”列中的数据。 “位置”列包含字典关键字的子字符串(不区分大小写)。我无法使我的任何一种方法都能正常工作,因此不胜感激。
incoming_df = pd.DataFrame({'First_Name' : ['John', 'Chris', 'renzo', 'Laura', 'Stan', 'Russ', 'Lip', 'Hick', 'Donald'],
'Last_Name' : ['stanford', 'lee', 'Olivares', 'Johnson', 'Stanley', 'Russaford', 'Lipper', 'Hero', 'Lipsey'],
'location' : ['Grant Elementary', 'Code Academy', 'Queen Prep', 'Waves College', 'duke Prep', 'california Academy', 'SF College Prep', 'San Ramon Prep', 'San Jose High']})
df = pd.DataFrame({'FirstN': [],
'LastN':[],
'Place': []})
# re index based on data given
df = df.reindex(incoming_df.index)
# copy data over to new dataframe
df['LastN'] = incoming_df.loc[:, incoming_df.columns.str.contains('Last', case=False)]
df['FirstN'] = incoming_df.loc[:, incoming_df.columns.str.contains('First', case=False)]
df['Place'] = incoming_df.loc[:, incoming_df.columns.str.contains('School|Work|Site|Location', case=False)]
places = { 'Grant' : 'DEF Grant Elementary',
'Code' : 'DEF Code Academy',
'Queen' : 'DEF Queen Preparatory High School',
'Waves' : 'DEF Waves College Prep',
'Duke' : 'DEF Duke Preparatory Institute',
'California' : 'DEF California Academy',
'SF College' : 'DEF San Francisco College',
'San Ramon' : 'DEF San Ramon Prep',
'San Jose' : 'DEF San Jose High School' }
# replace dictionary values with values in Place (results in NAN values inside 'Place' column
pat = r'({})'.format('|'.join(places.keys()))
extracted = df.Place.str.extract(pat, expand=False).dropna()
df['Place'] = extracted.apply(lambda x: places[x])
# Also tried this method but did not work
df['Place'] = df['Place'].replace(places)
# original df
FirstN LastN Place
0 John stanford Grant Elementary
1 Chris lee Code Academy
2 renzo Olivares Queen Prep
3 Laura Johnson Waves College
4 Stan Stanley duke Prep
5 Russ Russaford california Academy
6 Lip Lipper SF College Prep
7 Hick Hero San Ramon Prep
8 Donald Lipsey San Jose High
# target df
FirstN LastN Place
0 John Stanford DEF Grant Elementary
1 Chris Lee DEF Code Academy
2 Renzo Olivares DEF Queen Preparatory High School
3 Laura Johnson DEF Waves College Prep
4 Stan Stanley DEF Duke Preparatory Institute
5 Russ Russaford DEF California Academy
6 Lip Lipper DEF San Francisco College
7 Hick Hero DEF San Ramon Prep
8 Donald Lipsey DEF San Jose High School
答案 0 :(得分:1)
使用列表理解,并使用next
来短路并避免浪费迭代。
df.assign(Place=[next((v for i in df.Place if i in k.lower()), None) for k,v in dic.items()])
Place User
0 Heights College arenzo
1 Queens University brenzo
2 York Academy crenzo
3 Danes Institute drenzo
4 Duke University erenzo
答案 1 :(得分:0)
使用apply
和loc
for key, value in dic.items():
df.loc[df['Place'].apply(lambda x: x in key.lower()), 'Place'] = value
答案 2 :(得分:0)
鉴于'Place'上的字符串不匹配,这具有挑战性。一些幼稚的解决方法:
1)您可以利用索引映射,将字典重新格式化为:
<i class="fa fa-address-book" aria-hidden="true"></i>
然后使用从您的字典到df索引的映射:
dic = {'1' : 'Heights College',
'2' : 'Queens University',
'3' : 'York Academy',
'4' : 'Danes Institute',
'5' : 'Duke University'}
2)或者,如果您的用户列是唯一的,则可以复制上面的内容,编辑dic以映射到用户,然后应用类似的df.map。如果您的用户列是唯一的,则可以尝试使用执行根据用户查找您的字典并返回位置。
df['Place'] = df.index.to_series().map(dic)
答案 3 :(得分:0)
使用此循环解决了我的问题
for k, v in dic.items():
df['Place'] = np.where(df['Place'].str.contains(k, case=False), v, df['Place'])