问题标题可能无法准确反映问题,因为它很难总结。它显示起来要容易得多。我正在尝试根据df中的列创建新列。这些值位于间歇索引点。它们将始终被标记或与其他列相关联。
输入:
import pandas as pd
import numpy as np
k = 5
N = 10
df = pd.DataFrame({ 'Frame' : range(1, N + 1 ,1),
'A_X' : np.random.randint(k, k + 100 , size=N),
'A_Y' : np.random.randint(k, k + 100 , size=N),
'B_X' : np.random.randint(k, k + 100 , size=N),
'B_Y' : np.random.randint(k, k + 100 , size=N),
'C_X' : np.random.randint(k, k + 100 , size=N),
'C_Y' : np.random.randint(k, k + 100 , size=N),
'D_X' : np.random.randint(k, k + 100 , size=N),
'D_Y' : np.random.randint(k, k + 100 , size=N),
'E_X' : np.random.randint(k, k + 100 , size=N),
'E_Y' : np.random.randint(k, k + 100 , size=N),
'Events' : ['nan','A','nan','C','D','A','nan','nan','C','C']})
这导致:
A_X A_Y B_X B_Y C_X C_Y D_X D_Y E_X E_Y Events Frame
0 95 61 76 47 22 38 54 19 64 13 nan 1
1 82 87 87 24 59 31 55 16 101 78 A 2
2 10 25 66 28 70 78 75 19 23 90 nan 3
3 55 64 15 11 46 87 65 51 10 92 C 4
4 53 103 10 65 103 86 24 49 33 34 D 5
5 12 44 89 14 28 26 17 55 64 76 A 6
6 69 24 73 12 84 71 71 76 5 18 nan 7
7 40 35 73 40 78 31 51 33 77 98 nan 8
8 65 69 83 33 20 90 64 12 19 84 C 9
9 24 70 18 96 65 67 73 42 49 78 C 10
前10列是XY数据。我想选择适当的XY值来创建新列。他们是通过“事件”选出的。柱。值始终与其他列对应。对于例如第二排赛事是' A'所以我想从列A(A_X,A_Y)获取相同索引点的X和Y值。事件中的下一个值是C,所以我想要第四行的(C_X,C_Y)等。
所以输出会:
A_X A_Y B_X B_Y C_X C_Y D_X D_Y E_X E_Y Events Frame X Y
0 95 61 76 47 22 38 54 19 64 13 nan 1 nan nan
1 82 87 87 24 59 31 55 16 101 78 A 2 82 87
2 10 25 66 28 70 78 75 19 23 90 nan 3 nan nan
3 55 64 15 11 46 87 65 51 10 92 C 4 46 87
4 53 103 10 65 103 86 24 49 33 34 D 5 24 49
5 12 44 89 14 28 26 17 55 64 76 A 6 12 44
6 69 24 73 12 84 71 71 76 5 18 nan 7 nan nan
7 40 35 73 40 78 31 51 33 77 98 nan 8 nan nan
8 65 69 83 33 20 90 64 12 19 84 C 9 20 90
9 24 70 18 96 65 67 73 42 49 78 C 10 65 67
我试着写这样的东西:
df['X'] = np.where(df['Events'] == ['A'])
df['Y'] = np.where(df['Events'] == ['A'])
然后重复每个列字母,但这不会起作用,因为它们的标签不同。我想将X和Y合并在一起并标记它们[' A'' B'' C',&# 39; d'' E&#39]。
但我仍然错过了下一步。我没有从df返回值。
答案 0 :(得分:1)
我不知道你是否可以进行矢量化,但你可以通过迭代来实现它
result = pd.DataFrame(None, index=df.index, columns=['X', 'Y'])
for row in df.itertuples():
x, y = f'{row.Events}_X', f'{row.Events}_Y'
if row.Events == 'nan':
result.loc[row.Index, ['X', 'Y']] = [np.nan, np.nan]
else:
result.loc[row.Index, ['X', 'Y']] = row._asdict()[x], row._asdict()[y]
np.where
result = pd.DataFrame(None, index=df.index, columns=['X', 'Y'])
for value in df['Events'].unique():
if value == 'nan':
continue
x, y = f'{value}_X', f'{value}_Y'
result[['X', 'Y']] = np.where(df[['Events']] == value, df[[x, y]], result)
X Y 0 1 51 22 2 3 11 77 4 104 88 5 29 70 6 7 8 42 13 9 36 70
答案 1 :(得分:1)
这是使用pd.DataFrame.apply
的替代解决方案:
df['X'] = df.apply(lambda row: row.get(row['Events']+'_X'), axis=1)
df['Y'] = df.apply(lambda row: row.get(row['Events']+'_Y'), axis=1)
结果:
A_X A_Y B_X B_Y C_X C_Y D_X D_Y E_X E_Y Events Frame X Y
0 95 53 59 32 97 71 35 15 80 78 nan 1 NaN NaN
1 94 63 37 92 87 90 97 25 62 14 A 2 94.0 63.0
2 69 83 49 10 59 59 18 98 13 70 nan 3 NaN NaN
3 82 67 91 61 73 90 39 84 7 42 C 4 73.0 90.0
4 59 88 17 65 93 65 63 89 70 49 D 5 63.0 89.0
5 11 79 41 61 75 46 28 101 18 38 A 6 11.0 79.0
6 70 80 103 53 97 42 51 100 82 80 nan 7 NaN NaN
7 5 18 62 92 85 22 10 40 64 67 nan 8 NaN NaN
8 75 91 75 44 7 69 81 102 78 41 C 9 7.0 69.0
9 37 20 54 53 44 51 20 27 7 86 C 10 44.0 51.0