如果整个字符串包含基于字典键的pandas数据帧中的子字符串,请替换整个字符串

时间:2018-08-09 20:48:40

标签: python python-3.x pandas

我正在尝试用我创建的词典中的数据替换“位置”列中的数据。 “位置”列包含字典关键字的子字符串(不区分大小写)。我无法使我的任何一种方法都能正常工作,因此不胜感激。

incoming_df = pd.DataFrame({'First_Name' : ['John', 'Chris', 'renzo', 'Laura', 'Stan', 'Russ', 'Lip', 'Hick', 'Donald'],
                            'Last_Name' : ['stanford', 'lee', 'Olivares', 'Johnson', 'Stanley', 'Russaford', 'Lipper', 'Hero', 'Lipsey'],
                            'location' : ['Grant Elementary', 'Code Academy', 'Queen Prep', 'Waves College', 'duke Prep', 'california Academy', 'SF College Prep', 'San Ramon Prep', 'San Jose High']})

df = pd.DataFrame({'FirstN': [],
                        'LastN':[],
                        'Place': []})

# re index based on data given
df = df.reindex(incoming_df.index)

# copy data over to new dataframe
df['LastN'] = incoming_df.loc[:, incoming_df.columns.str.contains('Last', case=False)]
df['FirstN'] = incoming_df.loc[:, incoming_df.columns.str.contains('First', case=False)]
df['Place'] = incoming_df.loc[:, incoming_df.columns.str.contains('School|Work|Site|Location', case=False)]

places = { 'Grant' : 'DEF Grant Elementary',
                    'Code' : 'DEF Code Academy',
                    'Queen' : 'DEF Queen Preparatory High School',
                    'Waves' : 'DEF Waves College Prep',
                    'Duke' : 'DEF Duke Preparatory Institute',
                    'California' : 'DEF California Academy',
                    'SF College' : 'DEF San Francisco College',
                    'San Ramon' : 'DEF San Ramon Prep',
                    'San Jose' : 'DEF San Jose High School' }

# replace dictionary values with values in Place (results in NAN values inside 'Place' column
pat = r'({})'.format('|'.join(places.keys()))
extracted = df.Place.str.extract(pat, expand=False).dropna()
df['Place'] = extracted.apply(lambda x: places[x])

# Also tried this method but did not work
df['Place'] = df['Place'].replace(places)

# original df
    FirstN   LastN      Place
0   John    stanford    Grant Elementary
1   Chris   lee         Code Academy
2   renzo   Olivares    Queen Prep
3   Laura   Johnson     Waves College
4   Stan    Stanley     duke Prep
5   Russ    Russaford   california Academy
6   Lip     Lipper      SF College Prep
7   Hick    Hero        San Ramon Prep
8   Donald  Lipsey      San Jose High

# target df
    FirstN   LastN      Place
0   John    Stanford    DEF Grant Elementary
1   Chris   Lee         DEF Code Academy
2   Renzo   Olivares    DEF Queen Preparatory High School
3   Laura   Johnson     DEF Waves College Prep
4   Stan    Stanley     DEF Duke Preparatory Institute
5   Russ    Russaford   DEF California Academy
6   Lip     Lipper      DEF San Francisco College
7   Hick    Hero        DEF San Ramon Prep
8   Donald  Lipsey      DEF San Jose High School

4 个答案:

答案 0 :(得分:1)

使用列表理解,并使用next来短路并避免浪费迭代。

df.assign(Place=[next((v for i in df.Place if i in k.lower()), None) for k,v in dic.items()])

               Place    User
0    Heights College  arenzo
1  Queens University  brenzo
2       York Academy  crenzo
3    Danes Institute  drenzo
4    Duke University  erenzo

答案 1 :(得分:0)

使用applyloc

for key, value in dic.items():
    df.loc[df['Place'].apply(lambda x: x in key.lower()), 'Place'] = value

答案 2 :(得分:0)

鉴于'Place'上的字符串不匹配,这具有挑战性。一些幼稚的解决方法:

1)您可以利用索引映射,将字典重新格式化为:

<i class="fa fa-address-book" aria-hidden="true"></i>

然后使用从您的字典到df索引的映射:

dic = {'1' : 'Heights College',
  '2' : 'Queens University',
  '3' : 'York Academy',
  '4' : 'Danes Institute',
  '5' : 'Duke University'}

2)或者,如果您的用户列是唯一的,则可以复制上面的内容,编辑dic以映射到用户,然后应用类似的df.map。如果您的用户列是唯一的,则可以尝试使用执行根据用户查找您的字典并返回位置。

df['Place'] = df.index.to_series().map(dic)

答案 3 :(得分:0)

使用此循环解决了我的问题

for k, v in dic.items():
    df['Place'] = np.where(df['Place'].str.contains(k, case=False), v, df['Place'])