Question

我正在尝试使用几个变量来计算行中的累计和。

这是我的数据作为示例。我有5位患者ID和4个条件变量。如果条件中介于“ 1到3”之间的值，则将累加1。

subset(df, format(date, "%d-%m") == "01-11", select = c(steps, interval))

我在下面的语句中使用ID<-c("a","b","c","d","e") cond1<-as.factor(sample(x=1:7,size=5,replace=TRUE)) cond2<-as.factor(sample(x=1:7,size=5,replace=TRUE)) cond3<-as.factor(sample(x=1:7,size=5,replace=TRUE)) cond4<-as.factor(sample(x=1:7,size=5,replace=TRUE)) df<-data.frame(ID,cond1,cond2,cond3,cond4) df ID cond1 cond2 cond3 cond4 1 a 2 7 6 6 2 b 7 2 3 6 3 c 4 3 1 4 4 d 7 3 3 6 5 e 6 7 7 3代码。但是，作为第二行，尽管rowSums为2且cond2为3，但cond3不是'2'或'1'。第四行有同样的问题。

cumsum

如何使其累积？非常感谢您的帮助。

Answer 1

要进行1个以上元素比较，请使用%in%，但是%in%适用于vector。因此，我们用lapply/sapply遍历各列，然后在逻辑矩阵上进行rowSums

df$RSum <- rowSums(sapply(df[,2:5], `%in%`, 1:3))
df$RSum
#[1] 1 2 2 2 1

如果值是数字，那么我们也可以使用>或<

df$RSum <- rowSums(df[, 2:5] >=1 & df[, 2:5] <=3)

数据

df <- structure(list(ID = c("a", "b", "c", "d", "e"), cond1 = c(2L, 
7L, 4L, 7L, 6L), cond2 = c(7L, 2L, 3L, 3L, 7L), cond3 = c(6L, 
3L, 1L, 3L, 7L), cond4 = c(6L, 6L, 4L, 6L, 3L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

Answer 2

我建议您修复数据的两个问题：

您的数据很宽，而不是长格式。如果您的数据经过长时间格式化，则您的分析将更加简单。对于绘图尤其如此。
您对每种情况的值都是因素。这使得进行比较变得更加困难，并且可能引起一些难以发现的错误。如果您仔细地看到@akrun答案，您会注意到这些值是整数（数字）。

也就是说，我提出了一个data.table解决方案：

# 1. load libraries and make df a data.table:
library(data.table)
setDT(df)

# 2. make the wide table a long one
melt(df, id.vars = "ID")

# 3. with a long table, count the number of conditions that are in the 1:3 range for each ID. Notice I chained the first command with this second one:
melt(df, id.vars = "ID")[, sum(value %in% 1:3), by = ID]

哪个产生结果：

   ID V1
1:  a  1
2:  b  2
3:  c  2
4:  d  2
5:  e  1

您只需要在1和3下运行命令（2已链接为3）。有关更多详细信息，请参见?data.table。

您可以在wikipedia和Mike Wise's answer

中了解有关宽与长的更多信息。

我使用的数据与@akrun相同：

df <- structure(list(ID = c("a", "b", "c", "d", "e"),
                          cond1 = c(2L, 7L, 4L, 7L, 6L), 
                          cond2 = c(7L, 2L, 3L, 3L, 7L), 
                          cond3 = c(6L, 3L, 1L, 3L, 7L), 
                          cond4 = c(6L, 6L, 4L, 6L, 3L)), 
               class = "data.frame", 
               row.names = c("1", "2", "3", "4", "5"))

具有多个条件的行

2 个答案:

数据