Question

我有大型数据框，其中有一些列只有＆＃39; NA＆＃39;值。我想用百分比来总结每一行让我们说：df

user col1 col2 col3 col4 col5 col6
 100   1    1    2   2    1    NA
 200   1    2    3   3    NA   NA
 300   2    3    3   3    2    NA

我想根据总成员的百分比来总结user行例如，user：100具有3/5的事件1和2/5事件2。

summarized_df：

user event1 event2 event3
100    3/5   2/5    0
200    1/4   1/4    2/4
300    0     2/5    3/5

每个事件使用百分比也很有用我该怎么办R？

Answer 1

以下是包含apply，table和prop.table的基本R方法。

cbind(dat[1],
      prop.table(t(apply(dat[-1], 1,
                   function(x) table(factor(x, levels=1:3)))), 1))

需要

factor以确保应用于每一行的table输出返回每个潜在元素（1:3），即使未观察到一个或多个级别也是如此。这里，apply遍历所有行，并返回每个事件的计数，包括事件未发生时的0。因为每个调用的输出具有相同的长度，apply返回一个矩阵。我们转置矩阵并使用prop.table来计算每行的每个事件的比例。最后，cbind将第一列与此矩阵组合在一起，返回带有所需输出的data.frame。

返回

  user    1    2   3
1  100 0.60 0.40 0.0
2  200 0.25 0.25 0.5
3  300 0.00 0.40 0.6

数据

dat <- structure(list(user = c(100L, 200L, 300L), col1 = c(1L, 1L, 2L ), col2 = 1:3, col3 = c(2L, 3L, 3L), col4 = c(2L, 3L, 3L), col5 = c(1L, NA, 2L), col6 = c(NA, NA, NA)), .Names = c("user", "col1", "col2", "col3", "col4", "col5", "col6"), class = "data.frame", row.names = c(NA,-3L))

总结每行数据帧中记录的比例

1 个答案: