使用ifelse dplyr在R中输出某些行

时间:2015-07-06 15:38:54

标签: r dataframe dplyr

我有一个像这样的数据框

Type Species Letter Number Batch
  a     X      Al      1     H
  a     Y      Si      6     H

  b     Z      Mn      3     R
  b     Q      Qp      9     R

  c     L      Tw      10    R
  c     S      Rl      5     R

我使用过group_by(类型)的地方 我想编写一个查看BATCH列的语句,如果它是R,它将查看NUMBER列,并查看两者之间的最小数字,然后将该行的LETTER和NUMBER都设置为NA。 甚至不确定这是否可能,但这就是最终的结果

Type Species Letter Number Batch
  a     X      Al      1     H
  a     Y      Si      6     H

  b     Z      NA      NA    R
  b     Q      Qp      9     R

  c     L      Tw      10    R
  c     S      NA      NA    R

3 个答案:

答案 0 :(得分:4)

使用replace()which.min()的另一个想法:

df %>%
  group_by(Type) %>%
  mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number))

基本上你可以这样读:

  

df分组Type然后,如果Batch == "R"替换最小值   Number组中的NA值,否则返回原始值   Number

给出了:

#Source: local data frame [6 x 5]
#Groups: Type
#
#  Type Species Letter Number Batch
#1    a       X     Al      1     H
#2    a       Y     Si      6     H
#3    b       Z     Mn     NA     R
#4    b       Q     Qp      9     R
#5    c       L     Tw     10     R
#6    c       S     Rl     NA     R

<强>基准

df2 <- df[rep(row.names(df), 10e5),]

library(microbenchmark)
mbm <- microbenchmark(
  Gregor = df2 %>%
    group_by(Type) %>%
    mutate(make_na = Batch == "R" & Number == min(Number),
           Number = ifelse(make_na, NA, Number),
           Letter = ifelse(make_na, NA, Letter)) %>%
    select(-make_na),
  Steven = df2 %>%
    group_by(Type) %>%
    mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number)),
  times = 10, unit = "relative")

enter image description here

# Unit: relative
#    expr      min       lq     mean   median       uq      max neval cld
#  Gregor 1.863925 2.230475 2.065081 2.220267 2.004923 1.919964    10   b
#  Steven 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    10  a 

答案 1 :(得分:3)

简单,只需将您的单词映射到代码即可。我创建了一个中间列来标记需要NA的行 - ified,然后我将其删除。

your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), 1, 0),
       Number = ifelse(make_na == 1, NA, Number),
       Letter = ifelse(make_na == 1, NA, Letter)) %>%
select(-make_na)

我们可以简化一下:

# same code as above, using TRUE/FALSE instead of 1/0
your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), TRUE, FALSE),
       Number = ifelse(make_na, NA, Number),
       Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)

甚至更多一点,完全摆脱第一个ifelse()。 每当你有ifelse(..., TRUE, FALSE)时,ifelse()都是不必要的,它返回与第一个参数相同的东西

# make_na column is created directly as a logical column
your_grouped_df %>%
mutate(make_na = Batch == "R" & Number == min(Number),
       Number = ifelse(make_na, NA, Number),
       Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)

答案 2 :(得分:1)

我知道您正在寻找 SELECT Principal_Balance_Amt, Term_Nbr FROM [ProofOfConcept].[LendingClub].[ds_Lending_Club_Loan_Portfolio_NPI] WHERE ndayspastdue >= 30 AND WHERE ndayspastdue <=60 解决方案,但也值得一看[{1}}解决方案:

dplyr

数据

data.table