Question

我想更改数据框，以便每行只包含唯一值。例如，假设我有一个这样的数据框：

person1 person2 person3
1          2       NA
4          4       5 
6          NA      NA

但我想改变它，以便在每一行上只有唯一的值：

person1   person2   person3
1          NA       NA
NA         2        NA
NA         NA       NA
4          4        NA
NA         NA       5
6          NA       NA

最终的目标是我想制作一个关联矩阵，如下所示：

    person1   person2   person3
1      1         0         0
2      0         1         0
3      0         0         0
4      1         1         0
5      0         0         1
6      1         0         0

是否有人建议如何使用R？

Answer 1

一种方法是为自己分配一个与数据框中最高值一样多的行的矩阵，然后使用一个简单的循环将1填充到正确的位置。

让我们调用已分配的矩阵output，为其提供与原始数据框相同的字符串。

max.value <- max(df, na.rm=T)
output <- matrix(0, nrow = max.value, ncol=ncol(df))
colnames(output) <- colnames(df)

现在我们有一个6x3的零矩阵。现在，一个简单的嵌套循环遍历output的每一列，将{1}分配给output所代表的i的相应列位置。

for (j in 1:ncol(output)) {  #for each column of the output matrix
  for (i in df[, j]) {       #for the appropriate position in the column according to df
    output[i, j] <- 1        #assign 1 to that position
  }
}

> output
     person1 person2 person3
[1,]       1       0       0
[2,]       0       1       0
[3,]       0       0       0
[4,]       1       1       0
[5,]       0       0       1
[6,]       1       0       0

应该根据需要使用尽可能多的人和行。

附录：这是测试数据框的dput。

structure(list(person1 = c(1L, 4L, 6L), person2 = c(2L, 4L, NA
), person3 = c(NA, 5L, NA)), .Names = c("person1", "person2", 
"person3"), class = "data.frame", row.names = c(NA, -3L))

Answer 2

这不会填补＆＃34;缺失＆＃34;值（例如，没有人有3）但会创建稀疏关联矩阵。

library(tidyverse)

data = tribble(
  ~person1, ~person2, ~person3,
   1,        2,        NA,
   4,        4,        5,
   6,        NA,       NA
  )

data %>% 
  gather(key, value, na.rm = T) %>% 
  xtabs(~ value + key, data = ., sparse = T)

#> 5 x 3 sparse Matrix of class "dgCMatrix"
#>   person1 person2 person3
#> 1       1       .       .
#> 2       .       1       .
#> 4       1       1       .
#> 5       .       .       1
#> 6       1       .       .

如果你想构建所有＆＃34;缺失＆＃34;变量，你想要转换＆＃34;数字＆＃34;元素到所有级别的因子。

例如：

data %>% 
  gather(key, value, na.rm = T) %>% 
  # Add factor with levels 1:6 --> 1, 2, 3, 4, 5, 6
  mutate(value = factor(value, levels = 1:6)) %>% 
  xtabs(~ value + key, data = ., sparse = T)

#> 6 x 3 sparse Matrix of class "dgCMatrix"
#>   person1 person2 person3
#> 1       1       .       .
#> 2       .       1       .
#> 3       .       .       .
#> 4       1       1       .
#> 5       .       .       1
#> 6       1       .       .

R数据帧的每一行都有唯一的值

2 个答案: