这是数据集的以下子集:
A B C D R sentence ADR1 ADR2
112 135 21 EffexorXR.21 1 lack of good feeling. good feeling
113 135 21 EffexorXR.21 1 1
115 136 21 EffexorXR.21 2 Feel disconnected disconnected feel
116 136 21 EffexorXR.21 2 0
118 142 22 EffexorXR.22 1 Weight gain gain
119 142 22 EffexorXR.22 1 1
在ADR1和ADR2列中,对于每个字,行中应该有1或0。如果缺少值,我需要用" 0"替换它。这是输出:
A B C D R sentence ADR1 ADR2
112 135 21 EffexorXR.21 1 lack of good feeling. good feeling
113 135 21 EffexorXR.21 1 1 0
115 136 21 EffexorXR.21 2 Feel disconnected disconnected feel
116 136 21 EffexorXR.21 2 0 0
118 142 22 EffexorXR.22 1 Weight gain gain
119 142 22 EffexorXR.22 1 1
我试过
df[ADR1].fillna(0, inplace=True) and df[ADR2].fillna(0, inplace=True)
但是这段代码产生了以下df,这是不想要的
A B C D R sentence ADR1 ADR2
112 135 21 EffexorXR.21 1 lack of good feeling. good feeling
113 135 21 EffexorXR.21 1 1 0
115 136 21 EffexorXR.21 2 Feel disconnected disconnected feel 0
116 136 21 EffexorXR.21 2 0
118 142 22 EffexorXR.22 1 Weight gain gain 0
119 142 22 EffexorXR.22 1 1 0
答案 0 :(得分:3)
您可以使用reshape
一次允许每隔一行处理一次数据。类似的东西:
for col in ['ADR1', 'ADR2']:
data = np.reshape(df[col].values, (-1, 2))
need_fill = np.logical_and(data[:, 0] != '', data[:, 1] == '')
data[np.where(need_fill),1] = 0
import pandas as pd
from io import StringIO
import numpy as np
df = pd.read_fwf(StringIO(u"""
A B C D R sentence ADR1 ADR2
112 135 21 EffexorXR.21 1 lack of good feeling good feeling
113 135 21 EffexorXR.21 1 1
115 136 21 EffexorXR.21 2 Feel disconnected disconnected feel
116 136 21 EffexorXR.21 2 0
118 142 22 EffexorXR.22 1 Weight gain gain
119 142 22 EffexorXR.22 1 1"""),
header=1).fillna('')
print(df)
for col in ['ADR1', 'ADR2']:
data = np.reshape(df[col].values, (-1, 2))
need_fill = np.logical_and(data[:, 0] != '', data[:, 1] == '')
data[np.where(need_fill),1] = 0
print(df)
A B C D R sentence ADR1 ADR2
0 112 135 21 EffexorXR.21 1 lack of good feeling good feeling
1 113 135 21 EffexorXR.21 1 1
2 115 136 21 EffexorXR.21 2 Feel disconnected disconnected feel
3 116 136 21 EffexorXR.21 2 0
4 118 142 22 EffexorXR.22 1 Weight gain gain
5 119 142 22 EffexorXR.22 1 1
A B C D R sentence ADR1 ADR2
0 112 135 21 EffexorXR.21 1 lack of good feeling good feeling
1 113 135 21 EffexorXR.21 1 1 0
2 115 136 21 EffexorXR.21 2 Feel disconnected disconnected feel
3 116 136 21 EffexorXR.21 2 0 0
4 118 142 22 EffexorXR.22 1 Weight gain gain
5 119 142 22 EffexorXR.22 1 1
答案 1 :(得分:1)
您可以尝试的其他方式是iterate
rows
dataframe
每个column
检查并检查下一个值是否为空,因为第一个值不为空,然后将值更新为0
:
col_list = ['ADR1', 'ADR2'] # columns to check
for column in col_list: # for each column go through each rows
# however the step size is 2 at a time since current and next is checked
for i in range(0, df.shape[0]-1, 2):
first_val = df.loc[i][column]
next_val = df.loc[i+1][column]
# check if given current not empty, is next empty
if not first_val == '' and next_val == '':
df.loc[i+1, column] = 0 # update the value