我有一个像这样的数据框
Type Species Letter Number Batch
a X Al 1 H
a Y Si 6 H
b Z Mn 3 R
b Q Qp 9 R
c L Tw 10 R
c S Rl 5 R
我使用过group_by(类型)的地方 我想编写一个查看BATCH列的语句,如果它是R,它将查看NUMBER列,并查看两者之间的最小数字,然后将该行的LETTER和NUMBER都设置为NA。 甚至不确定这是否可能,但这就是最终的结果
Type Species Letter Number Batch
a X Al 1 H
a Y Si 6 H
b Z NA NA R
b Q Qp 9 R
c L Tw 10 R
c S NA NA R
答案 0 :(得分:4)
使用replace()
和which.min()
的另一个想法:
df %>%
group_by(Type) %>%
mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number))
基本上你可以这样读:
df
分组Type
然后,如果Batch == "R"
替换最小值Number
组中的NA
值,否则返回原始值Number
值
给出了:
#Source: local data frame [6 x 5]
#Groups: Type
#
# Type Species Letter Number Batch
#1 a X Al 1 H
#2 a Y Si 6 H
#3 b Z Mn NA R
#4 b Q Qp 9 R
#5 c L Tw 10 R
#6 c S Rl NA R
<强>基准强>
df2 <- df[rep(row.names(df), 10e5),]
library(microbenchmark)
mbm <- microbenchmark(
Gregor = df2 %>%
group_by(Type) %>%
mutate(make_na = Batch == "R" & Number == min(Number),
Number = ifelse(make_na, NA, Number),
Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na),
Steven = df2 %>%
group_by(Type) %>%
mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number)),
times = 10, unit = "relative")
# Unit: relative
# expr min lq mean median uq max neval cld
# Gregor 1.863925 2.230475 2.065081 2.220267 2.004923 1.919964 10 b
# Steven 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10 a
答案 1 :(得分:3)
简单,只需将您的单词映射到代码即可。我创建了一个中间列来标记需要NA
的行 - ified,然后我将其删除。
your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), 1, 0),
Number = ifelse(make_na == 1, NA, Number),
Letter = ifelse(make_na == 1, NA, Letter)) %>%
select(-make_na)
我们可以简化一下:
# same code as above, using TRUE/FALSE instead of 1/0
your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), TRUE, FALSE),
Number = ifelse(make_na, NA, Number),
Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)
甚至更多一点,完全摆脱第一个ifelse()
。
每当你有ifelse(..., TRUE, FALSE)
时,ifelse()
都是不必要的,它返回与第一个参数相同的东西
# make_na column is created directly as a logical column
your_grouped_df %>%
mutate(make_na = Batch == "R" & Number == min(Number),
Number = ifelse(make_na, NA, Number),
Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)
答案 2 :(得分:1)
我知道您正在寻找 SELECT Principal_Balance_Amt, Term_Nbr
FROM [ProofOfConcept].[LendingClub].[ds_Lending_Club_Loan_Portfolio_NPI]
WHERE ndayspastdue >= 30
AND WHERE ndayspastdue <=60
解决方案,但也值得一看[{1}}解决方案:
dplyr
data.table