将null值替换为0,每隔一行使用特殊条件

时间:2017-06-25 01:27:29

标签: python pandas

这是数据集的以下子集:

A  B    C         D         R        sentence              ADR1         ADR2     
112 135 21  EffexorXR.21    1    lack of good feeling.     good        feeling
113 135 21  EffexorXR.21    1                               1
115 136 21  EffexorXR.21    2    Feel disconnected        disconnected   feel    
116 136 21  EffexorXR.21    2                                             0
118 142 22  EffexorXR.22    1    Weight gain               gain         
119 142 22  EffexorXR.22    1                                1             

在ADR1和ADR2列中,对于每个字,行中应该有1或0。如果缺少值,我需要用" 0"替换它。这是输出:

A  B    C         D         R        sentence              ADR1         ADR2     
112 135 21  EffexorXR.21    1    lack of good feeling.     good        feeling
113 135 21  EffexorXR.21    1                               1             0
115 136 21  EffexorXR.21    2    Feel disconnected        disconnected   feel    
116 136 21  EffexorXR.21    2                                 0            0
118 142 22  EffexorXR.22    1    Weight gain               gain         
119 142 22  EffexorXR.22    1                                1    

我试过

df[ADR1].fillna(0, inplace=True) and df[ADR2].fillna(0, inplace=True)

但是这段代码产生了以下df,这是不想要的

 A  B    C         D         R        sentence              ADR1         ADR2     
112 135 21  EffexorXR.21    1    lack of good feeling.     good        feeling
    113 135 21  EffexorXR.21    1                               1        0
    115 136 21  EffexorXR.21    2    Feel disconnected        disconnected   feel                                                                 0
    116 136 21  EffexorXR.21    2                                             0
    118 142 22  EffexorXR.22    1    Weight gain               gain           0
    119 142 22  EffexorXR.22    1                                1            0 

2 个答案:

答案 0 :(得分:3)

您可以使用reshape一次允许每隔一行处理一次数据。类似的东西:

代码:

for col in ['ADR1', 'ADR2']:
    data = np.reshape(df[col].values, (-1, 2))
    need_fill = np.logical_and(data[:, 0] != '', data[:, 1] == '')
    data[np.where(need_fill),1] = 0

测试代码:

import pandas as pd
from io import StringIO
import numpy as np

df = pd.read_fwf(StringIO(u"""
    A   B   C   D             R  sentence              ADR1         ADR2     
    112 135 21  EffexorXR.21  1  lack of good feeling  good         feeling
    113 135 21  EffexorXR.21  1                        1
    115 136 21  EffexorXR.21  2  Feel disconnected     disconnected feel    
    116 136 21  EffexorXR.21  2                                     0
    118 142 22  EffexorXR.22  1  Weight gain           gain         
    119 142 22  EffexorXR.22  1                        1"""),
                 header=1).fillna('')

print(df)
for col in ['ADR1', 'ADR2']:
    data = np.reshape(df[col].values, (-1, 2))
    need_fill = np.logical_and(data[:, 0] != '', data[:, 1] == '')
    data[np.where(need_fill),1] = 0
print(df)

结果:

     A    B   C             D  R              sentence          ADR1     ADR2
0  112  135  21  EffexorXR.21  1  lack of good feeling          good  feeling
1  113  135  21  EffexorXR.21  1                                   1         
2  115  136  21  EffexorXR.21  2     Feel disconnected  disconnected     feel
3  116  136  21  EffexorXR.21  2                                            0
4  118  142  22  EffexorXR.22  1           Weight gain          gain         
5  119  142  22  EffexorXR.22  1                                   1         

     A    B   C             D  R              sentence          ADR1     ADR2
0  112  135  21  EffexorXR.21  1  lack of good feeling          good  feeling
1  113  135  21  EffexorXR.21  1                                   1        0
2  115  136  21  EffexorXR.21  2     Feel disconnected  disconnected     feel
3  116  136  21  EffexorXR.21  2                                   0        0
4  118  142  22  EffexorXR.22  1           Weight gain          gain         
5  119  142  22  EffexorXR.22  1                                   1         

答案 1 :(得分:1)

您可以尝试的其他方式是iterate rows dataframe每个column检查并检查下一个值是否为空,因为第一个值不为空,然后将值更新为0

col_list = ['ADR1', 'ADR2'] # columns to check
for column in col_list: # for each column go through each rows
    # however the step size is 2 at a time since current and next is checked
     for i in range(0, df.shape[0]-1, 2): 
        first_val = df.loc[i][column]
        next_val = df.loc[i+1][column]
        # check if given current not empty, is next empty
        if not first_val == '' and next_val  == '':
            df.loc[i+1, column] = 0 # update the value