如何从数据帧创建频率矩阵

时间:2021-06-13 17:31:43

标签: r dataframe matrix

说我有 df:

                rDate   CCRN630   CCRN800   CCRN532   CCRN570
1       2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
2       2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
3       2015-08-19 14:55:00 0.2878412 0.3213675 0.3327465 0.4172932
4       2015-08-19 15:00:00 0.2878412 0.3213675 0.3327465 0.4172932
5       2015-08-19 15:05:00 0.2878412 0.3213675 0.3327465 0.4172932
6       2015-08-19 15:10:00 0.2878412 0.3213675 0.3327465 0.4172932

18670   2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
18671   2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
18672   2015-10-23 10:40:00 0.1287671 0.1181319 0.2111437 0.2463768
18673   2015-08-19 15:00:00 0.1287671 0.1181319 0.2111437 0.2463768
18674   2015-08-19 15:05:00 0.1287671 0.1181319 0.2111437 0.2463768
18675   2015-08-19 15:10:00 0.1287671 0.1181319 0.2111437 0.2463768

我将如何创建一个频率矩阵,如:

df <- data.frame(cell = c("c1", "c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8"),
                 layer = c("L1", "L2", "L1", "L2", "L3", "L3", "L4", "L4", "L3"))
> df
  cell layer
1   c1    L1
2   c1    L2
3   c2    L1
4   c3    L2
5   c4    L3
6   c5    L3
7   c6    L4
8   c7    L4
9   c8    L3

但在 for 循环中?我试过类似的东西:

> table(df$cell, df$layer)
    
     L1 L2 L3 L4
  c1  1  1  0  0
  c2  1  0  0  0
  c3  0  1  0  0
  c4  0  0  1  0
  c5  0  0  1  0
  c6  0  0  0  1
  c7  0  0  0  1
  c8  0  0  1  0

但它对行进行单热处理并将其添加回原始数据帧... 我正在查看 > for(layer in unique(df$layer)){ + df[paste(layer)] <- ifelse(df$layer == layer, 1, 0) + } > df cell layer L1 L2 L3 L4 1 c1 L1 1 0 0 0 2 c1 L2 0 1 0 0 3 c2 L1 1 0 0 0 4 c3 L2 0 1 0 0 5 c4 L3 0 0 1 0 6 c5 L3 0 0 1 0 7 c6 L4 0 0 0 1 8 c7 L4 0 0 0 1 9 c8 L3 0 0 1 0 的源代码,但无法挑选出我感兴趣的部分。有没有办法“推”到一个空矩阵中? 类似:

base:::table

只是不知道如何完成它...谢谢! 预期输出,矩阵形式:

newMat <- Matrix(0, nrow = length(unique(df$cell)), ncol=length(unique(df$layer)))
for (i in 1:length(unique(df$cell))){
  for (j in 1:length(unique(df$layer)))){
    newMat[i,j] <- ....
  }
}

1 个答案:

答案 0 :(得分:0)

这是使用循环的一种方法:

df <- data.frame(cell = c("c1", "c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8"),
                 layer = c("L1", "L2", "L1", "L2", "L3", "L3", "L4", "L4", "L3"))


cells = unique(df$cell)
layers = unique(df$layer)

res = matrix(0L, 
             nrow = length(cells),
             ncol = length(layers),
             dimnames = list(cells, layers))

for (i in seq_along(cells)) {
  cols = match(df[df$cell == cells[i], 'layer'], layers)
  res[i, cols] = 1L
}

res

##    L1 L2 L3 L4
## c1  1  1  0  0
## c2  1  0  0  0
## c3  0  1  0  0
## c4  0  0  1  0
## c5  0  0  1  0
## c6  0  0  0  1
## c7  0  0  0  1
## c8  0  0  1  0

最重要的两件事是,由于我们需要多次使用唯一的 celllayers,因此分配给变量而不是使用 unique 比使用 match(df[df$cell == cells[i], ...]) 的性能更高一次。然后,此 1 调用将确定存在哪些层,以便我们为它们分配 table

注意,我根本不会这样做。我建议使用 #include <numeric> #include <iostream> #include <ctime> #include <random> class Fifteen { public: static const int N = 15; Fifteen(size_t seed) : rng(seed) {} std::vector<int> next() { std::vector<int> v(N); std::iota(begin(v), end(v), 1); bool even_permutation = true; for (int i = N - 1; i > 0; i--) { auto k = std::uniform_int_distribution(0, i)(rng); if (k != i) { std::swap(v[i], v[k]); even_permutation = !even_permutation; } } if (!even_permutation) transpose_pair(v); return v; } private: std::mt19937_64 rng; void transpose_pair(std::vector<int> & v) { auto n = std::uniform_int_distribution(0, N - 1)(rng); auto m = n; while (m == n) { m = std::uniform_int_distribution(0, N - 1)(rng); } std::swap(v[n], v[m]); } }; // simple test and example of usage int main() { Fifteen fifteen(time(nullptr)); auto v = fifteen.next(); for (auto n: v) std::cout << n << " "; std::cout << "\n"; } 或某种重塑机制。