在Python中,如何像R一样进行groupby +排列+突变(ifelse)?

时间:2019-02-05 15:38:50

标签: python pandas numpy

我有以下数据集:

ID      Date     Flag      Price     Flag_Amt     Factor
1      1/1/10     NA        20          NA          NA
1      1/2/10     3         20.2        1.05        .5
1      1/3/10     NA        19.2        NA          NA
2      1/1/10     5         12          6.50        1.3
2      1/2/10     NA        12.6        NA          NA
2      1/2/10     NA        13          NA          NA 
3      1/1/10     NA        100         NA          NA
3      1/2/10     5         88          16.7        .88
3      1/3/10     NA        90          NA          NA

,我有以下R dplyr代码:

df = df %>% group_by(ID) %>% arrange(Date) %>% mutate(New_Factor = ifelse(Flag == 5, (Flag_Amt/Price), Factor))

这将产生以下结果:

ID      Date     Flag      Price     Flag_Amt     Factor    New_Factor
1      1/1/10     NA        20          NA          NA         NA 
1      1/2/10     3         20.2        10.1        .5         .5
1      1/3/10     NA        19.2        NA          NA         NA        
2      1/1/10     5         12          6.50        1.3        1.85
2      1/2/10     NA        12.6        NA          NA         NA
2      1/2/10     NA        13          NA          NA         NA
3      1/1/10     NA        100         NA          NA         NA        
3      1/2/10     5         88          16.7        .88        5.27
3      1/3/10     NA        90          NA          NA         NA

但是,我很难在Python大熊猫中复制它。

以下是我尝试过的一些代码以及收到的错误:

df['New_Factor'] = df.groupby(['ID']).apply(lambda x: (x.Price/x.Flag_Amt) if x.Flag == 5 else (x.Factor))) 

错误:

  

系列的真值不明确。使用a.empty,a.bool(),   a.item(),a.any()或a.all()。

还有其他方法,也许可以将.transform()np.where()一起使用吗?

感谢您的帮助。

谢谢

1 个答案:

答案 0 :(得分:0)

您的 R 代码结果应如下所示:

r$> library(tibble)
r$> library(dplyr)
r$> df = tribble( 
            ~ID,  ~Date,   ~Flag,  ~Price,  ~Flag_Amt, ~Factor, 
        1,     '1/1/10', NA,      20,       NA,         NA, 
        1,     '1/2/10', 3,       20.2,     1.05,       .5, 
        1,     '1/3/10', NA,      19.2,     NA,         NA, 
        2,     '1/1/10', 5,       12,       6.50,       1.3, 
        2,     '1/2/10', NA,      12.6,     NA,         NA, 
        2,     '1/2/10', NA,      13,       NA,         NA , 
        3,     '1/1/10', NA,      100,      NA,         NA, 
        3,     '1/2/10', 5,       88,       16.7,       .88, 
        3,     '1/3/10', NA,      90,       NA,         NA
    )                                                  
r$> df
# A tibble: 9 x 6
     ID Date    Flag Price Flag_Amt Factor
  <dbl> <chr>  <dbl> <dbl>    <dbl>  <dbl>
1     1 1/1/10    NA  20      NA     NA   
2     1 1/2/10     3  20.2     1.05   0.5 
3     1 1/3/10    NA  19.2    NA     NA   
4     2 1/1/10     5  12       6.5    1.3 
5     2 1/2/10    NA  12.6    NA     NA   
6     2 1/2/10    NA  13      NA     NA   
7     3 1/1/10    NA 100      NA     NA   
8     3 1/2/10     5  88      16.7    0.88
9     3 1/3/10    NA  90      NA     NA   

r$> df %>% group_by(ID) %>% 
      arrange(Date) %>% 
      mutate(New_Factor = ifelse(Flag == 5, (Flag_Amt/Price), Factor))
# A tibble: 9 x 7
# Groups:   ID [3]
     ID Date    Flag Price Flag_Amt Factor New_Factor
  <dbl> <chr>  <dbl> <dbl>    <dbl>  <dbl>      <dbl>
1     1 1/1/10    NA  20      NA     NA        NA    
2     2 1/1/10     5  12       6.5    1.3       0.542
3     3 1/1/10    NA 100      NA     NA        NA    
4     1 1/2/10     3  20.2     1.05   0.5       0.5  
5     2 1/2/10    NA  12.6    NA     NA        NA    
6     2 1/2/10    NA  13      NA     NA        NA    
7     3 1/2/10     5  88      16.7    0.88      0.190
8     1 1/3/10    NA  19.2    NA     NA        NA    
9     3 1/3/10    NA  90      NA     NA        NA  

以下是使用 python 包 datar 的样子,无需深入研究 Pandas API:

>>> from datar.all import (
...     f, NA, tribble, c, rep,
...     group_by, arrange, mutate, if_else
... )
>>> 
>>> df = tribble(
...     f.ID,  f.Date,   f.Flag,  f.Price,  f.Flag_Amt, f.Factor,
...     1,     '1/1/10', NA,      20,       NA,         NA,
...     1,     '1/2/10', 3,       20.2,     1.05,       .5,
...     1,     '1/3/10', NA,      19.2,     NA,         NA,
...     2,     '1/1/10', 5,       12,       6.50,       1.3,
...     2,     '1/2/10', NA,      12.6,     NA,         NA,
...     2,     '1/2/10', NA,      13,       NA,         NA ,
...     3,     '1/1/10', NA,      100,      NA,         NA,
...     3,     '1/2/10', 5,       88,       16.7,       .88,
...     3,     '1/3/10', NA,      90,       NA,         NA,
... )
>>> df
   ID    Date  Flag  Price  Flag_Amt  Factor
0   1  1/1/10   NaN   20.0       NaN     NaN
1   1  1/2/10   3.0   20.2      1.05    0.50
2   1  1/3/10   NaN   19.2       NaN     NaN
3   2  1/1/10   5.0   12.0      6.50    1.30
4   2  1/2/10   NaN   12.6       NaN     NaN
5   2  1/2/10   NaN   13.0       NaN     NaN
6   3  1/1/10   NaN  100.0       NaN     NaN
7   3  1/2/10   5.0   88.0     16.70    0.88
8   3  1/3/10   NaN   90.0       NaN     NaN
>>> df = (
...     df >> 
...         group_by(f.ID) >> 
...         arrange(f.Date) >> 
...         mutate(New_Factor = if_else(f.Flag == 5, (f.Flag_Amt/f.Price), f.Factor))
... )
>>> df
   ID    Date  Flag  Price  Flag_Amt  Factor New_Factor
0   1  1/1/10   NaN   20.0       NaN     NaN        NaN
1   2  1/1/10   5.0   12.0      6.50    1.30   0.541667
2   3  1/1/10   NaN  100.0       NaN     NaN        NaN
3   1  1/2/10   3.0   20.2      1.05    0.50        0.5
4   2  1/2/10   NaN   12.6       NaN     NaN        NaN
5   2  1/2/10   NaN   13.0       NaN     NaN        NaN
6   3  1/2/10   5.0   88.0     16.70    0.88   0.189773
7   1  1/3/10   NaN   19.2       NaN     NaN        NaN
8   3  1/3/10   NaN   90.0       NaN     NaN        NaN
[Groups: ['ID'] (n=3)]

我是包的作者。如果您有任何问题,请随时提交问题。