获取匹配特定条件的行数并添加到R中的列

时间:2018-07-09 22:42:30

标签: r dataframe datatable

所以我有以下数据表:

Name | Addr   | Age
-------------------
Bill | 2112 W | 17
Barb | 2112 W | 16
Rick | 3445 E | 16
Chad | 2112 W | 5
Ruth | 5567 S | 4
Mick | 3445 E | 17
Hank | 3445 E | 1
Lace | 1111 S | 16
Nick | 2112 W | 4

我想添加一个计算列,以检查满足以下条件的行数是否大于两个:地址相同且年龄大于15的所有行,因此新表将是:

Name | Addr   | Age | Count
---------------------------
Bill | 2112 W | 17  | TRUE    #There are two people at addr 2112 W over 15, so True
Barb | 2112 W | 16  | TRUE    #There are two people at addr 2112 W over 15, so True
Rick | 3445 E | 16  | TRUE    #There are two people at addr 3445 E over 15, so True
Chad | 2112 W | 5   | TRUE    #There are two people at addr 2112 W over 15, so True
Ruth | 5567 S | 4   | FALSE   #No one at 5567 S is over 15, so False
Mick | 3445 E | 17  | TRUE    #There are two people at addr 3445 E over 15, so True
Hank | 3445 E | 1   | TRUE    #There are two people at addr 3445 E over 15, so True
Lace | 1111 S | 16  | FALSE   #Only one person over 15 is at addr 1111 S, so False
Nick | 5567 S | 16   | FALSE  #Two people live at addr, but only one of them is over 15 so False

这是我目前正在尝试的解决方案:

dat $ COUNT <-Map(function(x)nrow(dat [dat $ ADDR == x&dat $ ADDR> 15])> = 2,dat $ ADDR)

但是,这似乎无法正常运行,并且对于大型数据集而言运行极其缓慢。

3 个答案:

答案 0 :(得分:1)

怎么样?

library(tidyverse)
df %>%
    group_by(Addr) %>%
    mutate(count = n() > 1)
## A tibble: 8 x 4
## Groups:   Addr [4]
#  Name  Addr     Age count
#  <fct> <fct>  <int> <lgl>
#1 Bill  2112 W    17 TRUE
#2 Barb  2112 W    16 TRUE
#3 Rick  3445 E    16 TRUE
#4 Chad  2112 W     5 TRUE
#5 Ruth  5567 S     4 FALSE
#6 Mick  3445 E    17 TRUE
#7 Hank  3445 E     1 TRUE
#8 Lace  1111 S    16 FALSE

或者在基数R中使用ave

df$count <- as.logical(ave(rep(1, nrow(df)), df$Addr, FUN = function(x) sum(x) > 1))
df
#  Name   Addr Age count
#1 Bill 2112 W  17  TRUE
#2 Barb 2112 W  16  TRUE
#3 Rick 3445 E  16  TRUE
#4 Chad 2112 W   5  TRUE
#5 Ruth 5567 S   4 FALSE
#6 Mick 3445 E  17  TRUE
#7 Hank 3445 E   1  TRUE
#8 Lace 1111 S  16 FALSE    

样本数据

df <- read.table(text =
    "Name  Addr    Age
Bill  '2112 W'  17
Barb  '2112 W'  16
Rick  '3445 E'  16
Chad  '2112 W'  5
Ruth  '5567 S'  4
Mick  '3445 E'  17
Hank  '3445 E'  1
Lace  '1111 S'  16", header = T)

更新

包含更新后的示例数据和Age > 15

的要求
df <- read.table(text =
    "Name  Addr    Age
Bill  '2112 W'  17
Barb  '2112 W'  16
Rick  '3445 E'  16
Chad  '2112 W'  5
Ruth  '5567 S'  4
Mick  '3445 E'  17
Hank  '3445 E'  1
Lace  '1111 S'  16
Nick  '2112 W'  4", header = T)


df %>%
    group_by(Addr) %>%
    mutate(count = n() > 1 & Age > 15)
## A tibble: 9 x 4
## Groups:   Addr [4]
#  Name  Addr     Age count
#  <fct> <fct>  <int> <lgl>
#1 Bill  2112 W    17 TRUE
#2 Barb  2112 W    16 TRUE
#3 Rick  3445 E    16 TRUE
#4 Chad  2112 W     5 FALSE
#5 Ruth  5567 S     4 FALSE
#6 Mick  3445 E    17 TRUE
#7 Hank  3445 E     1 FALSE
#8 Lace  1111 S    16 FALSE
#9 Nick  2112 W     4 FALSE

答案 1 :(得分:1)

能否请您尝试以下操作,如果有帮助,请告诉我。

var$Count <- ifelse(var$Age>=15,"| TRUE","| FALSE")
var %>% group_by(Addr)

输出如下。

* <fct> <int> <fct> <fct> <int> <chr>  
1 |      2112 W     |        17 | TRUE 
2 |      2112 W     |        16 | TRUE 
3 |      3445 E     |        16 | TRUE 
4 |      2112 W     |         5 | FALSE
5 |      5567 S     |         4 | FALSE
6 |      3445 E     |        17 | TRUE 
7 |      3445 E     |         1 | FALSE
8 |      1111 S     |        16 | TRUE 

答案 2 :(得分:0)