我有以下数据框
df <- data.frame(Gender = c(rep(c("M","F"),each=4)),
DiffA=c(1,1,-1,-1,1,1,1,-1),
DiffB=c(1,-1,1,-1,1,1,1,-1))
我想创建2个新变量,这些变量总结每个性别i)DiffA和DiffB为正的行数和ii)DiffA和DiffB为负的行数,以便获得:
df2 <- data.frame(Gender = c("M","F"),
Diff_Pos=c(1,3),
Diff_Neg=c(1,1))
我无法组合dplyr n()的summary函数,它返回行数和所需的逻辑语句。提前致谢
答案 0 :(得分:3)
我会考虑做
library(tidyr)
df %>% filter(DiffA == DiffB) %>% count(Gender, DiffA) %>% spread(DiffA, n)
Gender -1 1
# (fctr) (int) (int)
# 1 F 1 3
# 2 M 1 1
类似的data.table代码是
dcast(df[DiffA == DiffB, .N, by=.(Gender, DiffA)], Gender ~ DiffA)
# Gender -1 1
# 1: F 1 3
# 2: M 1 1
如果您的真实数据超出-1
和1
,请将相关列包装在sign()
中。
答案 1 :(得分:1)
这是base R
选项
with(subset(df, DiffA==DiffB), table(Gender, DiffA))
# DiffA
#Gender -1 1
# F 1 3
# M 1 1
答案 2 :(得分:0)
这应该有效:
df %>%
dplyr::mutate(
Diff_Pos = DiffA > 0 & DiffB > 0,
Diff_Neg = DiffA < 0 & DiffB < 0) %>%
dplyr::group_by(Gender) %>%
dplyr::summarise(
Diff_Pos = sum(Diff_Pos),
Diff_Neg = sum(Diff_Neg))