pandas来自其他人的新列

时间:2018-05-24 03:20:42

标签: python pandas indexing apply

我有一个'Person',其中填充了来自不同科目的XY坐标。我想创建一个新列,从这些主题中获取指定的XY坐标。

import pandas as pd import numpy as np import random AA = 10, 20 k = 5 N = 10 df = pd.DataFrame({ 'John Doe_X' : np.random.uniform(k, k + 100 , size=N), 'John Doe_Y' : np.random.uniform(k, k + 100 , size=N), 'Kevin Lee_X' : np.random.uniform(k, k + 100 , size=N), 'Kevin Lee_Y' : np.random.uniform(k, k + 100 , size=N), 'Liam Smith_X' : np.random.uniform(k, k + -100 , size=N), 'Liam Smith_Y' : np.random.uniform(k, k + 100 , size=N), 'Event' : ['AA', 'nan', 'BB', 'nan', 'nan', 'CC', 'nan','CC', 'DD','nan'], 'Person' : ['nan','nan','John Doe','John Doe','nan','Kevin Lee','nan','Liam Smith','John Doe','John Doe']}) df['X'] = df.apply(lambda row: row.get(row['Person']+'_X') if pd.notnull(row['Person']) else np.nan, axis=1) df['Y'] = df.apply(lambda row: row.get(row['Person']+'_Y') if pd.notnull(row['Person']) else np.nan, axis=1) 列中突出显示任何主题的名称时,即可实现此目的。这将返回该索引处该主题的XY坐标。

  Event  John Doe_X  John Doe_Y  Kevin Lee_X  Kevin Lee_Y  Liam Smith_X  \
0    AA   75.047164   19.281168    28.064313    87.184248    -76.148559   
1   nan   50.642782   68.308319    46.088057    64.132263    -83.109383   
2    BB    9.965115   77.950894    48.864693     8.613132      0.106708   
3   nan   44.726136   58.751520    69.904076    40.818433    -87.656064   
4   nan  101.501119   99.156872   101.976300    93.539749    -57.026015   
5    CC   87.778446   65.814911     7.302116    40.577156    -28.703879   
6   nan   99.682139   91.715231    88.029451    82.309191    -66.444582   
7    CC   38.248267   38.648960    76.065297    67.322639    -34.754868   
8    DD   69.429353   61.252800    83.024358    58.038962    -62.001353   
9   nan    9.522023   73.009883    41.873986     8.677565    -20.389939   

   Liam Smith_Y      Person          X          Y  
0     18.420494         nan        NaN        NaN  
1     33.206289         nan        NaN        NaN  
2     73.833204    John Doe   9.965115  77.950894  
3     39.652071    John Doe  44.726136  58.751520  
4     88.176561         nan        NaN        NaN  
5     53.776995   Kevin Lee   7.302116  40.577156  
6     95.025923         nan        NaN        NaN  
7     26.851864  Liam Smith -34.754868  26.851864  
8    102.771046    John Doe  69.429353  61.252800  
9     28.633231    John Doe   9.522023  73.009883

输出:

'Event'

我现在希望使用['X','Y']列来优化新的AA (10,20)列。具体来说,当值'AA'位于'Event'列时,我想返回 Event John Doe_X John Doe_Y Kevin Lee_X Kevin Lee_Y Liam Smith_X \ 0 AA 75.047164 19.281168 28.064313 87.184248 -76.148559 1 nan 50.642782 68.308319 46.088057 64.132263 -83.109383 2 BB 9.965115 77.950894 48.864693 8.613132 0.106708 3 nan 44.726136 58.751520 69.904076 40.818433 -87.656064 4 nan 101.501119 99.156872 101.976300 93.539749 -57.026015 5 CC 87.778446 65.814911 7.302116 40.577156 -28.703879 6 nan 99.682139 91.715231 88.029451 82.309191 -66.444582 7 CC 38.248267 38.648960 76.065297 67.322639 -34.754868 8 DD 69.429353 61.252800 83.024358 58.038962 -62.001353 9 nan 9.522023 73.009883 41.873986 8.677565 -20.389939 Liam Smith_Y Person X Y 0 18.420494 nan 10 20 1 33.206289 nan 10 20 2 73.833204 John Doe 9.965115 77.950894 3 39.652071 John Doe 44.726136 58.751520 4 88.176561 nan NaN NaN 5 53.776995 Kevin Lee 7.302116 40.577156 6 95.025923 nan NaN NaN 7 26.851864 Liam Smith -34.754868 26.851864 8 102.771046 John Doe 69.429353 61.252800 9 28.633231 John Doe 9.522023 73.009883 的坐标。此外,我喜欢获得相同的坐标,直到下一个坐标出现。

所以输出看起来像是:

for value in df['Event']:
    if value == 'AA' :
        df['X', 'Y'] = AA

我试着写这样的东西:

ValueError: Length of values does not match length of index

但是得到一个ValueError:merge

2 个答案:

答案 0 :(得分:0)

您的代码有一些错误(Person与其他东西错误)。我认为这是一个粘贴错误。

然而,使用蒙版并将元组AA应用于蒙版使用的子集df.loc

,可以轻松解决您的问题
m = df['Event'] == 'AA'
df.loc[m, ['X','Y']] = AA

答案 1 :(得分:0)

如果要遍历行,可以尝试:

# iterate through rows
for index, row in df.iterrows():
    # check Event value for the row
    if row['Event'] == 'AA' :
        # update dataframe
        df.loc[index,('X', 'Y')] = AA

print(df)

结果:

  Event  John Doe_X  John Doe_Y  Kevin Lee_X  Kevin Lee_Y  Liam Smith_X  \
0    AA   12.603084   81.636376    25.997186    76.733337    -17.683132   
1   nan  104.652839  104.064767    56.762357    83.599629    -34.714117   
2    BB   69.724434   33.324135    98.452840    57.407782     -8.479175   
3   nan   16.361719   51.290716    41.929234    46.494053    -81.882100   
4   nan   30.874579   34.683986    95.434111    80.343098    -62.448286   
5    CC   77.619875   70.164773     7.385376    40.142712    -55.590472   
6   nan   31.214066   54.081010    36.249414    34.218611    -21.754019   
7    CC   91.487647   28.307019    71.235864    48.915612    -37.196812   
8    DD   45.036216   61.655465    50.231592    29.511502     -4.583804   
9   nan   95.249002   25.649100    31.959114    10.234085    -93.106746   
X   NaN         NaN         NaN          NaN          NaN           NaN   

   Liam Smith_Y      Person          X           Y  
0     86.267909         nan  10.000000   20.000000  
1     43.090388         nan        NaN         NaN  
2     56.330139    John Doe  69.724434   33.324135  
3     65.648633    John Doe  16.361719   51.290716  
4     16.349304         nan        NaN         NaN  
5      5.528887   Kevin Lee   7.385376   40.142712  
6     75.717007         nan        NaN         NaN  
7    100.925457  Liam Smith -37.196812  100.925457  
8     87.256541    John Doe  45.036216   61.655465  
9     35.361163    John Doe  95.249002   25.649100  
X           NaN         NaN        NaN         NaN