我的数据框中有以下列:
prototype
我想在此数据框中添加四列,SET1 .. SET4为:
在上面的示例中,输出应为:
W1 W2 W3 W4 L1 L2 L3 L4
0 6 6 3 6 7 3 6
7 Nan Nan Nan 6 Nan Nan Nan
我使用以下代码来应用前2个项目符号,但我正在努力正确处理SET1 SET2 SET3 SET4
0.0 0.0 1.0 0.0
1.0 Nan Nan Nan
。
NaN
答案 0 :(得分:4)
您需要startswith
,然后只需分割值,然后创建您需要的df
#df=df.replace('Nan',np.nan)
#df=df.astype(float)
new_df=pd.DataFrame((df.loc[:,df.columns.str.startswith('W')].values/df.loc[:,df.columns.str.startswith('L')].values))
new_df[new_df.notnull()]=new_df.gt(1).astype(int)
new_df
Out[239]:
0 1 2 3
0 0.0 0.0 1.0 0.0
1 1.0 NaN NaN NaN
答案 1 :(得分:3)
以下解决方案也适用于W*
和L*
列具有不同顺序的情况(例如:['W1','W3','W4','W2']
和['L2','L1','L4','L3']
):
演示:
In [135]: df = df[['W1','W3','W4','W2','L2','L1','L4','L3']]
In [136]: df
Out[136]:
W1 W3 W4 W2 L2 L1 L4 L3
0 0 6.0 3.0 6.0 7.0 6 6.0 3.0
1 7 NaN NaN NaN NaN 6 NaN NaN
In [137]: res = (df.filter(regex=r'^W\d+')
...: .gt(df.filter(regex=r'^L\d+')
...: .rename(columns=lambda c: c.replace('L','W')))
...: .astype(float))
...:
...: mask = (df.filter(regex=r'^W\d+').notna() &
...: df.filter(regex=r'^L\d+')
...: .rename(columns=lambda c: c.replace('L','W')).notna())
...:
...: df = df.join(res[mask].rename(columns=lambda c: c.replace('W','SET')))
...:
In [138]: df
Out[138]:
W1 W3 W4 W2 L2 L1 L4 L3 SET1 SET2 SET3 SET4
0 0 6.0 3.0 6.0 7.0 6 6.0 3.0 0.0 0.0 1.0 0.0
1 7 NaN NaN NaN NaN 6 NaN NaN 1.0 NaN NaN NaN
答案 2 :(得分:3)
一种方法是使用numpy
:
df = pd.DataFrame({'W1': [0, 7], 'W2': [6, np.nan], 'W3': [6, np.nan], 'W4': [3, np.nan],
'L1': [6, 6], 'L2': [7, np.nan], 'L3': [3, np.nan], 'L4': [6, np.nan]})
# split into 2 arrays
df_L = df.loc[:, df.columns.str.startswith('L')].values
df_W = df.loc[:, df.columns.str.startswith('W')].values
# apply comparison logic
A = (df_W > df_L).astype(float)
# apply nan logic
A[np.logical_or(np.isnan(df_L), np.isnan(df_W))] = np.nan
# create dataframe
res = pd.DataFrame(A, columns=['SET'+str(i) for i in range(1, A.shape[1]+1)])
print(res)
SET1 SET2 SET3 SET4
0 0.0 0.0 1.0 0.0
1 1.0 NaN NaN NaN
答案 3 :(得分:2)
还有numpy.select
。它优先考虑遇到的第一个条件,因此只需先设置空值检查,逻辑就可以正常工作。
import numpy as np
for i in range(1,5):
df['SET'+str(i)] = np.select(((df['W'+str(i)].isnull() | df['L'+str(i)].isnull()),
df['W'+str(i)] > df['L'+str(i)], df['W'+str(i)] < df['L'+str(i)]),
[np.NaN, 1, 0])
W1 W2 W3 W4 L1 L2 L3 L4 SET1 SET2 SET3 SET4
0 0 6 6 3 6 7 3 6 0.0 0.0 1.0 0.0
1 7 NaN NaN NaN 6 NaN NaN NaN 1.0 NaN NaN NaN
答案 4 :(得分:1)
将列拆分为MultiIndex
n = df.set_axis(
pd.MultiIndex.from_tuples(df.columns.map(tuple)),
axis=1, inplace=False
)
n
L W
1 2 3 4 1 2 3 4
0 6 7.0 3.0 6.0 0 6.0 6.0 3.0
1 6 NaN NaN NaN 7 NaN NaN NaN
d = n.W - n.L
d = d.gt(0).astype(int).mask(d.isna()).add_prefix('SET')
pd.concat([df, d], axis=1)
L1 L2 L3 L4 W1 W2 W3 W4 SET1 SET2 SET3 SET4
0 6 7.0 3.0 6.0 0 6.0 6.0 3.0 0 0.0 1.0 0.0
1 6 NaN NaN NaN 7 NaN NaN NaN 1 NaN NaN NaN
生成MultiIndex
n = df.set_axis(
pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
axis=1, inplace=False
)