Question

我有一个包含5列的数据框，我想在其他4列的基础上更新一列，数据框看起来像这样

from       via      to       x       y 
 3          2       13      in       out
 3          2       15      in       out
 3          2       21      in       out
13          2       3             
15          2       13     
21          2       13
1          12        2 
1          12        2
1          12        22
2          12        1      in
2          12        22     in      out
22         12        2

这个想法是根据其他四个列上的值填充X列，其顺序应如下所示：我必须检查x和y是否具有值，如果是，那么我必须使用（from，via）的对应值，并在所有行中将其与（to，via）的值进行比较，如果它们相同，那么我必须将与（从，通过）对应的Y值分配给具有（到，通过）相等值的行的X列因此在此示例中，我可以看到（from = 3，Via = 2具有x和y值，所以我将采用（from = 3，Via = 2）的值并将其与（to，via ），然后在所有具有（to = 3，via = 10）的行中分配（y = out）

最终结果应该是这样的：

from       via      to       x       y 
 3          2       13      in       out
 3          2       15      in       out
 3          2       21      in      
13          2       3       out      
15          2       13      out
21          2       13
1          12        2      out 
1          12        2      out
1          12        22     out
2          12        1      in
2          12        22     in      out
22         12        2      out

我如何在熊猫数据框中做到这一点？

Answer 1

我找不到完全相同的结果，但是我使用了所描述的算法：

# identify the lines where a change will occur and store the index and the new  value
tmp = df.assign(origix=df.index).merge(df[~df['x'].isna() & ~df['y'].isna()], 
                                       left_on = ['from', 'via'], right_on = ['to', 'via'],
                                       suffixes=('_x', '')).set_index('origix')

# apply changes in dataframe:
df.loc[tmp.index, 'x'] = tmp['y']

它给出：

根据数据框中许多其他列的值在列中分配值

1 个答案: