所以我有以下数据表:
Name | Addr | Age
-------------------
Bill | 2112 W | 17
Barb | 2112 W | 16
Rick | 3445 E | 16
Chad | 2112 W | 5
Ruth | 5567 S | 4
Mick | 3445 E | 17
Hank | 3445 E | 1
Lace | 1111 S | 16
Nick | 2112 W | 4
我想添加一个计算列,以检查满足以下条件的行数是否大于两个:地址相同且年龄大于15的所有行,因此新表将是:
Name | Addr | Age | Count
---------------------------
Bill | 2112 W | 17 | TRUE #There are two people at addr 2112 W over 15, so True
Barb | 2112 W | 16 | TRUE #There are two people at addr 2112 W over 15, so True
Rick | 3445 E | 16 | TRUE #There are two people at addr 3445 E over 15, so True
Chad | 2112 W | 5 | TRUE #There are two people at addr 2112 W over 15, so True
Ruth | 5567 S | 4 | FALSE #No one at 5567 S is over 15, so False
Mick | 3445 E | 17 | TRUE #There are two people at addr 3445 E over 15, so True
Hank | 3445 E | 1 | TRUE #There are two people at addr 3445 E over 15, so True
Lace | 1111 S | 16 | FALSE #Only one person over 15 is at addr 1111 S, so False
Nick | 5567 S | 16 | FALSE #Two people live at addr, but only one of them is over 15 so False
这是我目前正在尝试的解决方案:
dat $ COUNT <-Map(function(x)nrow(dat [dat $ ADDR == x&dat $ ADDR> 15])> = 2,dat $ ADDR)
但是,这似乎无法正常运行,并且对于大型数据集而言运行极其缓慢。
答案 0 :(得分:1)
怎么样?
library(tidyverse)
df %>%
group_by(Addr) %>%
mutate(count = n() > 1)
## A tibble: 8 x 4
## Groups: Addr [4]
# Name Addr Age count
# <fct> <fct> <int> <lgl>
#1 Bill 2112 W 17 TRUE
#2 Barb 2112 W 16 TRUE
#3 Rick 3445 E 16 TRUE
#4 Chad 2112 W 5 TRUE
#5 Ruth 5567 S 4 FALSE
#6 Mick 3445 E 17 TRUE
#7 Hank 3445 E 1 TRUE
#8 Lace 1111 S 16 FALSE
或者在基数R中使用ave
df$count <- as.logical(ave(rep(1, nrow(df)), df$Addr, FUN = function(x) sum(x) > 1))
df
# Name Addr Age count
#1 Bill 2112 W 17 TRUE
#2 Barb 2112 W 16 TRUE
#3 Rick 3445 E 16 TRUE
#4 Chad 2112 W 5 TRUE
#5 Ruth 5567 S 4 FALSE
#6 Mick 3445 E 17 TRUE
#7 Hank 3445 E 1 TRUE
#8 Lace 1111 S 16 FALSE
df <- read.table(text =
"Name Addr Age
Bill '2112 W' 17
Barb '2112 W' 16
Rick '3445 E' 16
Chad '2112 W' 5
Ruth '5567 S' 4
Mick '3445 E' 17
Hank '3445 E' 1
Lace '1111 S' 16", header = T)
包含更新后的示例数据和Age > 15
df <- read.table(text =
"Name Addr Age
Bill '2112 W' 17
Barb '2112 W' 16
Rick '3445 E' 16
Chad '2112 W' 5
Ruth '5567 S' 4
Mick '3445 E' 17
Hank '3445 E' 1
Lace '1111 S' 16
Nick '2112 W' 4", header = T)
df %>%
group_by(Addr) %>%
mutate(count = n() > 1 & Age > 15)
## A tibble: 9 x 4
## Groups: Addr [4]
# Name Addr Age count
# <fct> <fct> <int> <lgl>
#1 Bill 2112 W 17 TRUE
#2 Barb 2112 W 16 TRUE
#3 Rick 3445 E 16 TRUE
#4 Chad 2112 W 5 FALSE
#5 Ruth 5567 S 4 FALSE
#6 Mick 3445 E 17 TRUE
#7 Hank 3445 E 1 FALSE
#8 Lace 1111 S 16 FALSE
#9 Nick 2112 W 4 FALSE
答案 1 :(得分:1)
能否请您尝试以下操作,如果有帮助,请告诉我。
var$Count <- ifelse(var$Age>=15,"| TRUE","| FALSE")
var %>% group_by(Addr)
输出如下。
* <fct> <int> <fct> <fct> <int> <chr>
1 | 2112 W | 17 | TRUE
2 | 2112 W | 16 | TRUE
3 | 3445 E | 16 | TRUE
4 | 2112 W | 5 | FALSE
5 | 5567 S | 4 | FALSE
6 | 3445 E | 17 | TRUE
7 | 3445 E | 1 | FALSE
8 | 1111 S | 16 | TRUE
答案 2 :(得分:0)