我有以下数据框(这是一个较大的数据框的子集,该数据框具有> 3000 obs和2种不同的年份):
rp.pptn <- data.frame(id = c("150015", "150016", "150017", "150018",
"150019", "150020"), year = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("15", "18"), class = "factor"),
freqtools = c(1, 1, 2, 1, 1, 3), freqtrees = c(2, 3, 3, 5, 4, 3),
freqrt = c(2, 2, 2, 2, 1, 3), freqroamfriends = c(1, 1, 1, 3, 1, 1),
freqroamalone = c(1, 1, 1, 2, 1, 1), freqparts = c(2, 2, 2, 2, 3, 3),
freqmessy = c(5, 5, 2, 5, 4, 5), freqride = c(3, 1, 2, 5, 3, 3),
freqrain = c(1, 3, 2, 3, 1, 3))
我想count
列c(3:11)
中满足条件的值。我一直在尝试rowSums,因为当我没有id
或分组变量year
,rowSums
时,实际上得到的计数如下:
rp.pptn.no.id <- rp.pptn %>%
select(c(3:11)) %>%
mutate(pptnlow = rowSums(pptnrp == 1 | pptnrp == 2 | pptnrp == 6))
我还能够如下计算选择列的rowSums
:
rp.pptn <- rp.pptn %>%
mutate(pptnlow = rowSums(.[c(3:11)]))
但是,鉴于我需要id
和year
进行后续分析,因此我想一次性完成这两个步骤。考虑到我的数据是数字的原因,我很感兴趣,为什么rowSums
首先给我计数而不是总和。我实际上想要计数,即有多少列符合我的条件?
搜索使我认为基于此的某些功能可能会起作用:
rp.pptn <- rp.pptn %>%
mutate(pptnlow = rowSums(. [3:11]) %in% c(1, 2, 6))
这将返回逻辑向量= FALSE
,大概是因为我的条件未满足。我认为我并没有丢失太多,但最终我想要的是下面的df:
rp.pptn <- data.frame(id = c("150015", "150016", "150017", "150018",
"150019", "150020"), year = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("15", "18"), class = "factor"),
freqtools = c(1, 1, 2, 1, 1, 3), freqtrees = c(2, 3, 3, 5, 4, 3),
freqrt = c(2, 2, 2, 2, 1, 3), freqroamfriends = c(1, 1, 1, 3, 1, 1),
freqroamalone = c(1, 1, 1, 2, 1, 1), freqparts = c(2, 2, 2, 2, 3, 3),
freqmessy = c(5, 5, 2, 5, 4, 5), freqride = c(3, 1, 2, 5, 3, 3),
freqrain = c(1, 3, 2, 3, 1, 3), pptnlow = c(7, 6, 8, 4, 5, 2))
如上所述,我的实际数据集更大,因此自动化程度越高越好!谢谢。
答案 0 :(得分:2)
一个选项是reduce
和map
library(tidyverse)
map(c(1, 2, 6), ~ rp.pptn %>%
transmute_at(3:11, funs(. == .x)) %>%
reduce(`+`)) %>%
reduce(`+`) %>%
mutate(rp.pptn, pptnlow = .)
或者使用rowSums
和map
map(c(1, 2, 6), ~
rp.pptn %>%
select(3:11) %>%
transmute(pptnlow = rowSums(. == .x))) %>%
bind_cols %>%
rowSums %>%
mutate(rp.pptn, pptnlow = .)
答案 1 :(得分:2)
我们可以使用mutate_at
将基于条件(1、2、6)的值替换为TRUE
或FALSE
,使用rowSums
,然后绑定到原始数据帧。
library(dplyr)
rp.pptn2 <- rp.pptn %>%
mutate_at(vars(3:11), funs(. %in% c(1, 2, 6))) %>%
transmute(pptnlow = rowSums(.[, 3:11])) %>%
bind_cols(rp.pptn, .)