Pandas - 如果列较大且不为null,则生成值

时间:2018-04-28 21:26:11

标签: python pandas boolean nan

我的数据框中有以下列:

prototype

我想在此数据框中添加四列,SET1 .. SET4为:

  • 1.0如果Wi>李,也不是南
  • 0.0如果Wi<李,也不是南
  • nan,如果Wi或Li是Nan

在上面的示例中,输出应为:

W1 W2   W3    W4   L1  L2  L3    L4
0  6    6     3    6   7   3     6
7  Nan Nan   Nan   6 Nan  Nan   Nan

我使用以下代码来应用前2个项目符号,但我正在努力正确处理SET1 SET2 SET3 SET4 0.0 0.0 1.0 0.0 1.0 Nan Nan Nan

NaN

5 个答案:

答案 0 :(得分:4)

您需要startswith,然后只需分割值,然后创建您需要的df

#df=df.replace('Nan',np.nan)
#df=df.astype(float)
new_df=pd.DataFrame((df.loc[:,df.columns.str.startswith('W')].values/df.loc[:,df.columns.str.startswith('L')].values))


new_df[new_df.notnull()]=new_df.gt(1).astype(int)
new_df
Out[239]: 
     0    1    2    3
0  0.0  0.0  1.0  0.0
1  1.0  NaN  NaN  NaN

答案 1 :(得分:3)

以下解决方案也适用于W*L*列具有不同顺序的情况(例如:['W1','W3','W4','W2']['L2','L1','L4','L3']):

演示:

In [135]: df = df[['W1','W3','W4','W2','L2','L1','L4','L3']]

In [136]: df
Out[136]:
   W1   W3   W4   W2   L2  L1   L4   L3
0   0  6.0  3.0  6.0  7.0   6  6.0  3.0
1   7  NaN  NaN  NaN  NaN   6  NaN  NaN

In [137]: res = (df.filter(regex=r'^W\d+')
     ...:          .gt(df.filter(regex=r'^L\d+')
     ...:                .rename(columns=lambda c: c.replace('L','W')))
     ...:          .astype(float))
     ...:
     ...: mask = (df.filter(regex=r'^W\d+').notna() &
     ...:         df.filter(regex=r'^L\d+')
     ...:           .rename(columns=lambda c: c.replace('L','W')).notna())
     ...:
     ...: df = df.join(res[mask].rename(columns=lambda c: c.replace('W','SET')))
     ...:

In [138]: df
Out[138]:
   W1   W3   W4   W2   L2  L1   L4   L3  SET1  SET2  SET3  SET4
0   0  6.0  3.0  6.0  7.0   6  6.0  3.0   0.0   0.0   1.0   0.0
1   7  NaN  NaN  NaN  NaN   6  NaN  NaN   1.0   NaN   NaN   NaN

答案 2 :(得分:3)

一种方法是使用numpy

df = pd.DataFrame({'W1': [0, 7], 'W2': [6, np.nan], 'W3': [6, np.nan], 'W4': [3, np.nan],
                   'L1': [6, 6], 'L2': [7, np.nan], 'L3': [3, np.nan], 'L4': [6, np.nan]})

# split into 2 arrays
df_L = df.loc[:, df.columns.str.startswith('L')].values
df_W = df.loc[:, df.columns.str.startswith('W')].values

# apply comparison logic
A = (df_W > df_L).astype(float)

# apply nan logic
A[np.logical_or(np.isnan(df_L), np.isnan(df_W))] = np.nan

# create dataframe
res = pd.DataFrame(A, columns=['SET'+str(i) for i in range(1, A.shape[1]+1)])

print(res)

   SET1  SET2  SET3  SET4
0   0.0   0.0   1.0   0.0
1   1.0   NaN   NaN   NaN

答案 3 :(得分:2)

还有numpy.select。它优先考虑遇到的第一个条件,因此只需先设置空值检查,逻辑就可以正常工作。

import numpy as np

for i in range(1,5):
    df['SET'+str(i)] = np.select(((df['W'+str(i)].isnull() | df['L'+str(i)].isnull()), 
                        df['W'+str(i)] > df['L'+str(i)], df['W'+str(i)] < df['L'+str(i)]), 
                        [np.NaN, 1, 0])

   W1   W2   W3   W4  L1   L2   L3   L4  SET1 SET2 SET3 SET4
0   0    6    6    3   6    7    3    6  0.0  0.0  1.0  0.0
1   7  NaN  NaN  NaN   6  NaN  NaN  NaN  1.0  NaN  NaN  NaN

答案 4 :(得分:1)

将列拆分为MultiIndex

n = df.set_axis(
    pd.MultiIndex.from_tuples(df.columns.map(tuple)),
    axis=1, inplace=False
)

n

   L                 W               
   1    2    3    4  1    2    3    4
0  6  7.0  3.0  6.0  0  6.0  6.0  3.0
1  6  NaN  NaN  NaN  7  NaN  NaN  NaN
d = n.W - n.L
d = d.gt(0).astype(int).mask(d.isna()).add_prefix('SET')

pd.concat([df, d], axis=1)

   L1   L2   L3   L4  W1   W2   W3   W4  SET1  SET2  SET3  SET4
0   6  7.0  3.0  6.0   0  6.0  6.0  3.0     0   0.0   1.0   0.0
1   6  NaN  NaN  NaN   7  NaN  NaN  NaN     1   NaN   NaN   NaN

生成MultiIndex

的方式稍微强大一些
n = df.set_axis(
    pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
    axis=1, inplace=False
)