在Pandas中替换每个单元格的多个值

时间:2016-09-18 09:44:28

标签: list python-2.7 pandas dictionary replace

我在数据框中有以下列:

Q2
1 4
1 3
3 4 11 
1 4 6 15 16

我想替换单元格中的多个值(如果存在):1替换为Facebook2替换为Instagram,依此类推。

我将值拆分如下:

columns_to_split = 'Q2'

for c in columns_to_split:
    df[c] = df[c].str.split(' ')

输出

code                             
DSOKF31                          [1, 4]
DSOVH39                          [1, 3]
DSOVH05                          [3, 4, 16]
DSOVH23                          [1, 4, 6, 15, 16]
Name: Q2, dtype: object

但是当尝试用字典替换多个值时,如下所示:

social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}

df['Q2'] = df['Q2'].replace(social_media_2)

我得到相同的输出:

code                             
DSOKF31                          [1, 4]
DSOVH39                          [1, 3]
DSOVH05                          [3, 4, 16]
DSOVH23                          [1, 4, 6, 15, 16]
Name: Q2, dtype: object

在这种情况下,如何在一个单元格中替换多个值?

3 个答案:

答案 0 :(得分:3)

由于项目数量不同,因此结构不多。仍然,在拆分字符串后,您可以_Player将列表映射到字典值的函数:

apply

编辑根据Jezrael的优秀评论,这里有一个可以解释缺失值的版本:

In [36]: df = pd.DataFrame({'Q2': ['1 4', '1 3', '1 2 3']})

In [37]: df.Q2.str.split(' ').apply(lambda l: [social_media_2[e] for e in l])
Out[37]: 
0    [Facebook, Messenger (Google hangout, Tagg, Wh...
1                                  [Facebook, Twitter]
2                       [Facebook, Instagram, Twitter]
Name: Q2, dtype: object

答案 1 :(得分:3)

这是另一种解决方案:

In [45]: df
Out[45]:
            Q2
0          1 4
1          1 3
2       3 4 16
3  1 4 6 15 16

In [47]: (df.Q2.str.split(expand=True)
   ....:    .stack()
   ....:    .map(social_media_2)
   ....:    .unstack()
   ....:    .apply(lambda x: x.dropna().values.tolist(), axis=1)
   ....: )
Out[47]:
0                       [Facebook, Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)]
1                                                                              [Facebook, Twitter]
2                 [Twitter, Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO), Gmail]
3    [Facebook, Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO), Imo, Yahoo, Gmail]
dtype: object

说明:

In [50]: df.Q2.str.split(expand=True).stack().map(social_media_2)
Out[50]:
0  0                                                          Facebook
   1    Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)
1  0                                                          Facebook
   1                                                           Twitter
2  0                                                           Twitter
   1    Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)
   2                                                             Gmail
3  0                                                          Facebook
   1    Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)
   2                                                               Imo
   3                                                             Yahoo
   4                                                             Gmail
dtype: object

In [51]: df.Q2.str.split(expand=True).stack().map(social_media_2).unstack()
Out[51]:
          0                                                               1      2      3      4
0  Facebook  Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)   None   None   None
1  Facebook                                                         Twitter   None   None   None
2   Twitter  Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)  Gmail   None   None
3  Facebook  Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)    Imo  Yahoo  Gmail

时间对40K行DF:

In [86]: big = pd.concat([df] * 10**4, ignore_index=True)

In [87]: big.shape
Out[87]: (40000, 1)

In [88]: %%timeit
   ....: (big.Q2.str.split(expand=True)
   ....:     .stack()
   ....:     .map(social_media_2)
   ....:     .unstack()
   ....:     .apply(lambda x: x.dropna().values.tolist(), axis=1)
   ....: )
   ....:
1 loop, best of 3: 19.6 s per loop

In [89]: %timeit big.Q2.str.split(' ').apply(lambda l: [social_media_2[e] for e in l])
10 loops, best of 3: 72.6 ms per loop

结论: Ami的解决方案是约。快270倍!

答案 2 :(得分:3)

如果不需要list作为输出,则只将regex=True添加到replace

import pandas as pd
import numpy as np

df = pd.DataFrame({'Q2': ['1 4', '1 3', '3 4 11']})
print (df)
       Q2
0     1 4
1     1 3
2  3 4 11

social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)

                                                  Q2
0  Facebook Messenger (Google hangout, Tagg, What...
1                                   Facebook Twitter
2  Twitter Messenger (Google hangout, Tagg, Whats...

如果需要lists,请使用其他解决方案。

通过评论编辑:

;可以replace空格,然后效果很好:

df = pd.DataFrame({'Q2': ['1 4', '1 3', '3 4 11']})
print (df)
       Q2
0     1 4
1     1 3
2  3 4 11

df['Q2'] = df['Q2'].str.replace(' ',';')
print (df)
       Q2
0     1;4
1     1;3
2  3;4;11

social_media_2 = {'1':'Facebook', '2':'Instagram', '3':'Twitter', '4':'Messenger (Google hangout, Tagg, WhatsAPP, MSG, Facetime, IMO)', '5':'SnapChat', '6':'Imo', '7':'Badoo', '8':'Viber', '9':'Twoo', '10':'Linkedin', '11':'Flickr', '12':'Meetup', '13':'Tumblr', '14':'Pinterest', '15':'Yahoo', '16':'Gmail', '17':'Hotmail', '18':'M-Pesa', '19':'M-Shwari', '20':'KCB-Mpesa', '21':'Equitel', '22':'MobiKash', '23':'Airtel money', '24':'Orange Money', '25':'Mobile Bankig Accounts', '26':'Other specify'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
                                                  Q2
0  Facebook;Messenger (Google hangout, Tagg, What...
1                                   Facebook;Twitter
2  Twitter;Messenger (Google hangout, Tagg, Whats...

EDIT1:

通过将dict添加到;然后替换为keys,可以稍微更改;

df = pd.DataFrame({'Q2': ['1 2', '1 3', '3 2 11']})
print (df)
       Q2
0     1 2
1     1 3
2  3 2 11

df['Q2'] = df['Q2'].str.replace(' ',';;') + ';'
print (df)
          Q2
0      1;;2;
1      1;;3;
2  3;;2;;11;

social_media_2 = {'1':'Fa', '2':'I', '3':'T', '11':'KL'}
#add ; to keys in dict
social_media_2 = dict((key + ';', value) for (key, value) in social_media_2.items())
print (social_media_2)
{'1;': 'Fa', '2;': 'I', '3;': 'T', '11;': 'KL'}
df['Q2'] = df['Q2'].replace(social_media_2, regex=True)
print (df)
        Q2
0     Fa;I
1     Fa;T
2  T;I;1Fa