我在数据框中有以下列:
Q2
1 4
1 3
3 4 11
1 4 6 15 16
我想替换单元格中的多个值(如果存在):1
替换为Facebook
,2
替换为Instagram
,依此类推。
我将值拆分如下:
columns_to_split = 'Q2'
for c in columns_to_split:
df[c] = df[c].str.split(' ')
输出
code
DSOKF31 [1, 4]
DSOVH39 [1, 3]
DSOVH05 [3, 4, 16]
DSOVH23 [1, 4, 6, 15, 16]
Name: Q2, dtype: object
但是当尝试用字典替换多个值时,如下所示:
social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2)
我得到相同的输出:
code
DSOKF31 [1, 4]
DSOVH39 [1, 3]
DSOVH05 [3, 4, 16]
DSOVH23 [1, 4, 6, 15, 16]
Name: Q2, dtype: object
在这种情况下,如何在一个单元格中替换多个值?
答案 0 :(得分:3)
由于项目数量不同,因此结构不多。仍然,在拆分字符串后,您可以_Player
将列表映射到字典值的函数:
apply
编辑根据Jezrael的优秀评论,这里有一个可以解释缺失值的版本:
In [36]: df = pd.DataFrame({'Q2': ['1 4', '1 3', '1 2 3']})
In [37]: df.Q2.str.split(' ').apply(lambda l: [social_media_2[e] for e in l])
Out[37]:
0 [Facebook, Messenger (Google hangout, Tagg, Wh...
1 [Facebook, Twitter]
2 [Facebook, Instagram, Twitter]
Name: Q2, dtype: object
答案 1 :(得分:3)
这是另一种解决方案:
In [45]: df
Out[45]:
Q2
0 1 4
1 1 3
2 3 4 16
3 1 4 6 15 16
In [47]: (df.Q2.str.split(expand=True)
....: .stack()
....: .map(social_media_2)
....: .unstack()
....: .apply(lambda x: x.dropna().values.tolist(), axis=1)
....: )
Out[47]:
0 [Facebook, Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)]
1 [Facebook, Twitter]
2 [Twitter, Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO), Gmail]
3 [Facebook, Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO), Imo, Yahoo, Gmail]
dtype: object
说明:
In [50]: df.Q2.str.split(expand=True).stack().map(social_media_2)
Out[50]:
0 0 Facebook
1 Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)
1 0 Facebook
1 Twitter
2 0 Twitter
1 Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)
2 Gmail
3 0 Facebook
1 Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)
2 Imo
3 Yahoo
4 Gmail
dtype: object
In [51]: df.Q2.str.split(expand=True).stack().map(social_media_2).unstack()
Out[51]:
0 1 2 3 4
0 Facebook Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO) None None None
1 Facebook Twitter None None None
2 Twitter Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO) Gmail None None
3 Facebook Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO) Imo Yahoo Gmail
时间对40K行DF:
In [86]: big = pd.concat([df] * 10**4, ignore_index=True)
In [87]: big.shape
Out[87]: (40000, 1)
In [88]: %%timeit
....: (big.Q2.str.split(expand=True)
....: .stack()
....: .map(social_media_2)
....: .unstack()
....: .apply(lambda x: x.dropna().values.tolist(), axis=1)
....: )
....:
1 loop, best of 3: 19.6 s per loop
In [89]: %timeit big.Q2.str.split(' ').apply(lambda l: [social_media_2[e] for e in l])
10 loops, best of 3: 72.6 ms per loop
结论: Ami的解决方案是约。快270倍!
答案 2 :(得分:3)
如果不需要list
作为输出,则只将regex=True
添加到replace
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Q2': ['1 4', '1 3', '3 4 11']})
print (df)
Q2
0 1 4
1 1 3
2 3 4 11
social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
Q2
0 Facebook Messenger (Google hangout, Tagg, What...
1 Facebook Twitter
2 Twitter Messenger (Google hangout, Tagg, Whats...
如果需要lists
,请使用其他解决方案。
通过评论编辑:
;
可以replace空格,然后效果很好:
df = pd.DataFrame({'Q2': ['1 4', '1 3', '3 4 11']})
print (df)
Q2
0 1 4
1 1 3
2 3 4 11
df['Q2'] = df['Q2'].str.replace(' ',';')
print (df)
Q2
0 1;4
1 1;3
2 3;4;11
social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
Q2
0 Facebook;Messenger (Google hangout, Tagg, What...
1 Facebook;Twitter
2 Twitter;Messenger (Google hangout, Tagg, Whats...
EDIT1:
通过将dict
添加到;
然后替换为keys
,可以稍微更改;
:
df = pd.DataFrame({'Q2': ['1 2', '1 3', '3 2 11']})
print (df)
Q2
0 1 2
1 1 3
2 3 2 11
df['Q2'] = df['Q2'].str.replace(' ',';;') + ';'
print (df)
Q2
0 1;;2;
1 1;;3;
2 3;;2;;11;
social_media_2 = {'1':'Fa', '2':'I', '3':'T', '11':'KL'}
#add ; to keys in dict
social_media_2 = dict((key + ';', value) for (key, value) in social_media_2.items())
print (social_media_2)
{'1;': 'Fa', '2;': 'I', '3;': 'T', '11;': 'KL'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
Q2
0 Fa;I
1 Fa;T
2 T;I;1Fa