R - 如何拆分/组合多个变量的列

时间:2018-02-21 02:29:22

标签: r statistics tidyr

我是R的新手,我无法找到如何将具有多个变量(样本1-4)的列拆分为单独的列,同时移动与之关联的数据的答案。这是一个例子:

Samples     Content
Sample 1    70.7
Sample 1    91.6
Sample 1    92.6
Sample 1    65.2
Sample 1    80.0
Sample 1    82.1
Sample 1    88.1
Sample 1    92.2
Sample 1    53.3
Sample 1    80.0
Sample 1    60.3
Sample 1    89.7
Sample 1    84.8
Sample 1    94.0
Sample 1    71.8
Sample 1    76.9
Sample 1    91.4
Sample 1    57.9
Sample 1    61.9
Sample 1    71.5
Sample 2    88.7
Sample 2    67.6
Sample 2    61.7
Sample 2    70.8
Sample 2    45.3
Sample 2    55.6
Sample 2    64.6
Sample 2    62.7
Sample 2    72.4
Sample 2    46.8
Sample 2    59.0
Sample 2    63.7
Sample 2    67.0
Sample 2    71.6
Sample 2    48.3
Sample 2    55.6
Sample 2    62.5
Sample 2    60.0
Sample 2    72.9
Sample 2    47.4
Sample 3    42.3
Sample 3    48.2
Sample 3    64.0
Sample 3    33.3
Sample 3    19.0
Sample 3    41.0
Sample 3    53.1
Sample 3    46.5
Sample 3    30.0
Sample 3    43.4
Sample 3    43.7
Sample 3    92.0
Sample 3    53.0
Sample 3    33.0
Sample 3    48.4
Sample 3    43.2
Sample 3    41.8
Sample 3    62.5
Sample 3    33.3
Sample 3    49.3
Sample 4    51.8
Sample 4    57.3
Sample 4    43.3
Sample 4    42.3
Sample 4    37.6
Sample 4    54.9
Sample 4    71.1
Sample 4    33.8
Sample 4    43.1
Sample 4    39.1
Sample 4    63.0
Sample 4    74.0
Sample 4    31.0
Sample 4    48.3
Sample 4    42.9
Sample 4    62.2
Sample 4    35.4
Sample 4    33.8
Sample 4    40.7
Sample 4    41.2

我试过tidyr没有成功。我希望输出是这样的;

Sample 1    Sample 2    Sample 3    Sample 4
70.7    88.7    42.3    51.8
91.6    67.6    48.2    57.3
92.6    61.7    64.0    43.3
65.2    70.8    33.3    42.3
80.0    45.3    19.0    37.6
82.1    55.6    41.0    54.9
88.1    64.6    53.1    71.1
92.2    62.7    46.5    33.8
53.3    72.4    30.0    43.1
80.0    46.8    43.4    39.1
60.3    59.0    43.7    63.0
89.7    63.7    92.0    74.0
84.8    67.0    53.0    31.0
94.0    71.6    33.0    48.3
71.8    48.3    48.4    42.9
76.9    55.6    43.2    62.2
91.4    62.5    41.8    35.4
57.9    60.0    62.5    33.8
61.9    72.9    33.3    40.7
71.5    47.4    49.3    41.2

非常感谢,如果确定了解决方案,如果我想做回报,是否有答案?

额外 - 有没有办法对堆叠在一列中的数据进行t检验,例如第一个例子而不必转换它?

2 个答案:

答案 0 :(得分:2)

  1. 您可能拥有"重复的标识符"问题使用tidyr::spread。您首先需要生成Sample + identifier的唯一组合,您可以这样做(假设数据框名为df1):

    library(tidyverse) # for dplyr + tidyr
    df1 %>% 
      group_by(Samples) %>% 
      mutate(id = row_number()) %>% 
      spread(Samples, Content) %>%
      select(-id)
    
  2.   

    "如果我想做回报"

  3. 你的意思是走另一条路,从宽阔的形式回到原来的长形式?然后使用gather。将其添加到上面代码的末尾,看看会发生什么:

    %>% gather(Samples, Content)
    
    1. t-test:有很多方法可以对长格式数据进行t检验。例如,比较样本1和2的基本R方式可能是:

      t.test(df1[df1$Samples == "Sample 1", "Content"], 
             df1[df1$Samples == "Sample 2", "Content"])
      

答案 1 :(得分:1)

作为每个'样本'的元素数量。同样,我们可以使用unstack

中的base R
unstack(df1, Content~Samples)
#    Sample.1 Sample.2 Sample.3 Sample.4
#1      70.7     88.7     42.3     51.8
#2      91.6     67.6     48.2     57.3
#3      92.6     61.7     64.0     43.3
#4      65.2     70.8     33.3     42.3
#5      80.0     45.3     19.0     37.6
#6      82.1     55.6     41.0     54.9
#7      88.1     64.6     53.1     71.1
#8      92.2     62.7     46.5     33.8
#9      53.3     72.4     30.0     43.1
#10     80.0     46.8     43.4     39.1
#11     60.3     59.0     43.7     63.0
#12     89.7     63.7     92.0     74.0
#13     84.8     67.0     53.0     31.0
#14     94.0     71.6     33.0     48.3
#15     71.8     48.3     48.4     42.9
#16     76.9     55.6     43.2     62.2
#17     91.4     62.5     41.8     35.4
#18     57.9     60.0     62.5     33.8
#19     61.9     72.9     33.3     40.7
#20     71.5     47.4     49.3     41.2

没有使用外部包

如果' Sample'元素不同,然后可以使用dcast data.table(在两种情况下都适用)

library(data.table)
dcast(setDT(df1), rowid(Samples)~Samples, value.var = "Content")