我有以下数据集:
ID Date Flag Price Flag_Amt Factor
1 1/1/10 NA 20 NA NA
1 1/2/10 3 20.2 1.05 .5
1 1/3/10 NA 19.2 NA NA
2 1/1/10 5 12 6.50 1.3
2 1/2/10 NA 12.6 NA NA
2 1/2/10 NA 13 NA NA
3 1/1/10 NA 100 NA NA
3 1/2/10 5 88 16.7 .88
3 1/3/10 NA 90 NA NA
,我有以下R dplyr代码:
df = df %>% group_by(ID) %>% arrange(Date) %>% mutate(New_Factor = ifelse(Flag == 5, (Flag_Amt/Price), Factor))
这将产生以下结果:
ID Date Flag Price Flag_Amt Factor New_Factor
1 1/1/10 NA 20 NA NA NA
1 1/2/10 3 20.2 10.1 .5 .5
1 1/3/10 NA 19.2 NA NA NA
2 1/1/10 5 12 6.50 1.3 1.85
2 1/2/10 NA 12.6 NA NA NA
2 1/2/10 NA 13 NA NA NA
3 1/1/10 NA 100 NA NA NA
3 1/2/10 5 88 16.7 .88 5.27
3 1/3/10 NA 90 NA NA NA
但是,我很难在Python大熊猫中复制它。
以下是我尝试过的一些代码以及收到的错误:
df['New_Factor'] = df.groupby(['ID']).apply(lambda x: (x.Price/x.Flag_Amt) if x.Flag == 5 else (x.Factor)))
错误:
系列的真值不明确。使用a.empty,a.bool(), a.item(),a.any()或a.all()。
还有其他方法,也许可以将.transform()
与np.where()
一起使用吗?
感谢您的帮助。
谢谢
答案 0 :(得分:0)
您的 R 代码结果应如下所示:
r$> library(tibble)
r$> library(dplyr)
r$> df = tribble(
~ID, ~Date, ~Flag, ~Price, ~Flag_Amt, ~Factor,
1, '1/1/10', NA, 20, NA, NA,
1, '1/2/10', 3, 20.2, 1.05, .5,
1, '1/3/10', NA, 19.2, NA, NA,
2, '1/1/10', 5, 12, 6.50, 1.3,
2, '1/2/10', NA, 12.6, NA, NA,
2, '1/2/10', NA, 13, NA, NA ,
3, '1/1/10', NA, 100, NA, NA,
3, '1/2/10', 5, 88, 16.7, .88,
3, '1/3/10', NA, 90, NA, NA
)
r$> df
# A tibble: 9 x 6
ID Date Flag Price Flag_Amt Factor
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 1/1/10 NA 20 NA NA
2 1 1/2/10 3 20.2 1.05 0.5
3 1 1/3/10 NA 19.2 NA NA
4 2 1/1/10 5 12 6.5 1.3
5 2 1/2/10 NA 12.6 NA NA
6 2 1/2/10 NA 13 NA NA
7 3 1/1/10 NA 100 NA NA
8 3 1/2/10 5 88 16.7 0.88
9 3 1/3/10 NA 90 NA NA
r$> df %>% group_by(ID) %>%
arrange(Date) %>%
mutate(New_Factor = ifelse(Flag == 5, (Flag_Amt/Price), Factor))
# A tibble: 9 x 7
# Groups: ID [3]
ID Date Flag Price Flag_Amt Factor New_Factor
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1/1/10 NA 20 NA NA NA
2 2 1/1/10 5 12 6.5 1.3 0.542
3 3 1/1/10 NA 100 NA NA NA
4 1 1/2/10 3 20.2 1.05 0.5 0.5
5 2 1/2/10 NA 12.6 NA NA NA
6 2 1/2/10 NA 13 NA NA NA
7 3 1/2/10 5 88 16.7 0.88 0.190
8 1 1/3/10 NA 19.2 NA NA NA
9 3 1/3/10 NA 90 NA NA NA
以下是使用 python 包 datar
的样子,无需深入研究 Pandas API:
>>> from datar.all import (
... f, NA, tribble, c, rep,
... group_by, arrange, mutate, if_else
... )
>>>
>>> df = tribble(
... f.ID, f.Date, f.Flag, f.Price, f.Flag_Amt, f.Factor,
... 1, '1/1/10', NA, 20, NA, NA,
... 1, '1/2/10', 3, 20.2, 1.05, .5,
... 1, '1/3/10', NA, 19.2, NA, NA,
... 2, '1/1/10', 5, 12, 6.50, 1.3,
... 2, '1/2/10', NA, 12.6, NA, NA,
... 2, '1/2/10', NA, 13, NA, NA ,
... 3, '1/1/10', NA, 100, NA, NA,
... 3, '1/2/10', 5, 88, 16.7, .88,
... 3, '1/3/10', NA, 90, NA, NA,
... )
>>> df
ID Date Flag Price Flag_Amt Factor
0 1 1/1/10 NaN 20.0 NaN NaN
1 1 1/2/10 3.0 20.2 1.05 0.50
2 1 1/3/10 NaN 19.2 NaN NaN
3 2 1/1/10 5.0 12.0 6.50 1.30
4 2 1/2/10 NaN 12.6 NaN NaN
5 2 1/2/10 NaN 13.0 NaN NaN
6 3 1/1/10 NaN 100.0 NaN NaN
7 3 1/2/10 5.0 88.0 16.70 0.88
8 3 1/3/10 NaN 90.0 NaN NaN
>>> df = (
... df >>
... group_by(f.ID) >>
... arrange(f.Date) >>
... mutate(New_Factor = if_else(f.Flag == 5, (f.Flag_Amt/f.Price), f.Factor))
... )
>>> df
ID Date Flag Price Flag_Amt Factor New_Factor
0 1 1/1/10 NaN 20.0 NaN NaN NaN
1 2 1/1/10 5.0 12.0 6.50 1.30 0.541667
2 3 1/1/10 NaN 100.0 NaN NaN NaN
3 1 1/2/10 3.0 20.2 1.05 0.50 0.5
4 2 1/2/10 NaN 12.6 NaN NaN NaN
5 2 1/2/10 NaN 13.0 NaN NaN NaN
6 3 1/2/10 5.0 88.0 16.70 0.88 0.189773
7 1 1/3/10 NaN 19.2 NaN NaN NaN
8 3 1/3/10 NaN 90.0 NaN NaN NaN
[Groups: ['ID'] (n=3)]
我是包的作者。如果您有任何问题,请随时提交问题。