列的子集的中位数,其他列的值为1

时间:2014-09-16 14:36:18

标签: r

this one非常相似的问题,但存在一些根本区别。

我有一个时间戳,4个测量列和4个状态列的数据集:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor")), .Names = c("Timestamp", "A", "B", 
"C", "D", "SA", "SB", "SC", "SD"), row.names = c(NA, 6L), class = "data.frame")

我想计算打开的列的中位数,如S *列中的1所示。

到目前为止,我可以使用以下方法逐行查找哪些测量列:

foo[i, c(which(x = foo[i, 6:9] == 1, arr.ind = FALSE) + 1)]

其中i是行号。

就我而言,没有我的代码变得过于复杂。我以为我可以通过将上面的代码行(在逐行for循环之后)到时间戳之后绑定我创建一个新的数据框,用NAs填充空白点,计算中位数该数据帧,最后将中位数绑定到原始数​​据帧。但必须有更好的方法!

有什么想法吗?

编辑:

输出应如下所示:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), Median = c(18.51558, 18.756115, 
19.262825, 18.757795, 18.89796, 19.056395)), .Names = c("Timestamp", 
"A", "B", "C", "D", "SA", "SB", "SC", "SD", "Median"), row.names = c(NA, 
6L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

这有点乱,因为您的S*列是因素。如果您将它们转换为数字或逻辑,则可以跳过以下第二行代码:

w <- grepl("^S", names(foo))
m <- matrix(as.logical(as.numeric(as.matrix(foo[, w]))), ncol = sum(w))
foo$Median <- apply(`[<-`(as.matrix(foo[,LETTERS[1:4]]), !m, NA), 1, median, na.rm=TRUE)
foo
#             Timestamp A        B        C D SA SB SC SD   Median
# 1 2014-09-01 06:00:02 0 20.77579 16.25537 0  0  1  1  0 18.51558
# 2 2014-09-01 06:00:06 0 21.05727 16.45496 0  0  1  1  0 18.75612
# 3 2014-09-01 06:00:10 0 21.81632 16.70933 0  0  1  1  0 19.26282
# 4 2014-09-01 06:00:14 0 21.36299 16.15260 0  0  1  1  0 18.75780
# 5 2014-09-01 06:00:18 0 21.18629 16.60963 0  0  1  1  0 18.89796
# 6 2014-09-01 06:00:22 0 21.34721 16.76558 0  0  1  1  0 19.05640