Question

概述

我有一个名为 df1 的数据框，其中包含两列：（1）Urbanisaiton_index（包含**四个子级别（1-4）；和（ 2 ）Canopy_Index

对于数据分析，我想进行一次ANOVA来区分Urbanisation_index的子级别组内和子级别组之间以及Canopy_Index的差异的总体方差。这个想法是要区分不同的城市化水平是否会影响树种栎木（Quercus petraea）。

为了进行方差分析，我需要翻转数据框中的列并创建一个新的数据框。我希望列标题为1、2、3、4，以表示Urbanisation_index的四个组或子级别中的差异。其次，我想将属于每个子级别的Canopy_Index值列出到其特定的子级别列中（请参见所需结果）。

一旦构建了所需的新数据框，数据将以正确的格式分组以进行ANOVA。

我尝试了许多种不同的方法，例如转置，但是我无法弄清楚如何将urbansation_index子级别（1-4）列为列标题并编译其关联的Canopy_Index值（即，每个Urbanisation_index子级别的Canopy_Index行数））下方的特定列中。

例如，如果为Urbanisation_index子级别1的数据框进行了过滤，则Canopy_Index可能有6个观测值（5、5、5、5、55、55），我希望将它们列在下面新数据框中的标题1如下所示。

如果有人可以提供帮助，我将非常感激。

Rcode

##transpose
  t(df1)

所需结果

 1   2   3   4
65  55   5  35
45  85  55  45
75  75  15  25

数据

    structure(list(Urbanisation_index = c(2, 2, 4, 4, 3, 3, 4, 4, 
4, 2, 4, 3, 4, 4, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 
2, 2, 2, 4, 4, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 4, 4, 
4, 4, 4, 4), Canopy_Index = c(65, 75, 55, 85, 85, 85, 95, 85, 
85, 45, 65, 75, 75, 65, 35, 75, 65, 85, 65, 95, 75, 75, 75, 65, 
75, 65, 75, 95, 95, 85, 85, 85, 75, 75, 65, 85, 75, 65, 55, 95, 
95, 95, 95, 45, 55, 35, 55, 65, 95, 95, 45, 65, 45, 55)), row.names = c(NA, 
-54L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x1030086e0>, index = structure(integer(0), "`__Species`" = integer(0)))

Answer 1

使用您提供的数据：

data<-structure(list(Urbanisation_index = c(2, 2, 4, 4, 3, 3, 4, 4, 
                                            4, 2, 4, 3, 4, 4, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 
                                            2, 2, 2, 4, 4, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 4, 4, 
                                            4, 4, 4, 4), 
                     Canopy_Index = c(65, 75, 55, 85, 85, 85, 95, 85, 
                                      85, 45, 65, 75, 75, 65, 35, 75, 65, 85, 65, 95, 75, 75, 75, 65, 
                                      75, 65, 75, 95, 95, 85, 85, 85, 75, 75, 65, 85, 75, 65, 55, 95, 
                                      95, 95, 95, 45, 55, 35, 55, 65, 95, 95, 45, 65, 45, 55)), 
                row.names = c(NA, 
                              -54L), 
                class = c("data.table", "data.frame"), 
                index = structure(integer(0), "`__Species`" = integer(0)))

加载程序包

library(tidyr)
library(dplyr)
library(purrr)

首先按城市化指数对冠层指数值进行分组，并获得所有谷值的列表，并附加它们以调整长度。

a<-data %>%
  group_by(Urbanisation_index) %>%
  summarise(Canopy_Indexes=paste(Canopy_Index, collapse = "-")) %>%
  spread(key = Urbanisation_index, value = Canopy_Indexes) %>%
  map(.f = ~ separate_rows(data.frame(.), 1, sep = "-"))

a <- lapply(a, function(x){
  x1<-x[,1]
  length(x1) <- max(sapply(a, nrow))
  x1
}) %>% data.frame()

colnames(a) <- paste("sub_level", 1:4, sep = "_")
a

这是另一个更紧凑的解决方案，但是由于我是第一个提出的，所以不想浪费它：）

b <- map(split(data, data$Urbanisation_index), 2)


b <- lapply(b, function(x){
  x1<-x
  length(x1) <- max(sapply(b, length))
  x1
}) %>% data.frame()

colnames(b) <- paste("sub_level", 1:4, sep = "_")
b

结果：

   sub_level_1 sub_level_2 sub_level_3 sub_level_4
1           35          65          85          55
2           75          75          85          85
3           65          45          75          95
4           85          95          65          85
5           55          85          95          85
6           55          85          75          65
7           NA          85          75          75
8           NA          85          75          65
9           NA          75          65          75
10          NA          65          75          75
11          NA          95          65          65
12          NA          95          75          95
13          NA          95          95          95
14          NA          95          65          45
15          NA          45          NA          65
16          NA          55          NA          45
17          NA          35          NA          55

希望这会有所帮助

切换数据框中的列和行，并在单独的列标题下列出观察值以执行方差分析：单因素

1 个答案: