有没有办法将多行分散到多列中?

时间:2019-07-12 03:42:36

标签: r

我是R的初学者,因此如果不能说一口流利,我深表歉意。我想散布数据,以便一个Sample_file的所有变量都在一行中。我的数据(RW_leftjoin)当前看起来像这样:

           Sample_File   Marker Peak Allele Height
 1: A02_1710963103.fsa     AMEL    1      X   5137
 2: A02_1710963103.fsa     AMEL    2      Y   4898
 3: A02_1710963103.fsa   CSF1PO    1     11    805
 4: A02_1710963103.fsa   CSF1PO    2     12    652

我希望我的数据看起来像这样:

Sample_File          AMEL1 AMEL2 Height1 Height2 CSF1PO1 CSF1PO1 Height1 Height2
 1: A02_1710963103.fsa    X    Y    5137    4898    11    12      805    652

使用R可以吗?

我尝试使用此功能:

RW_spread <- RW_leftjoin %>%
  rowid_to_column() %>% 
  group_by(Sample_File, Marker) %>%
  mutate(ID = paste0(Marker, Peak)) %>%
  ungroup() %>%
  spread(ID, Allele)

但是数据看起来像这样:

rowid Sample_File    Marker    Peak Height AMEL1 AMEL2 CSF1PO1 CSF1PO2
   <int> <chr>              <chr>    <dbl> <chr>  <chr> <chr> <chr>   <chr>  
 1    1 A02_1710963103.fsa AMEL    1 5137    X    NA    NA      NA     
 2    2 A02_1710963103.fsa AMEL    2 4898   NA    Y     NA      NA     
 3    3 A02_1710963103.fsa CSF1PO    1 805    NA    NA    11    NA     
 4    4 A02_1710963103.fsa CSF1PO    2 652    NA    NA    NA    12

将对此表示感谢。

3 个答案:

答案 0 :(得分:2)

一种tidyr方法是将gatherAllele变量Height放入一个单独的列中,并用其余的列创建一个关键变量(不包括用作id),先使用unite,然后spread设置键/值对。

library(tidyr) 

RW_leftjoin %>%
  gather(key, value, Allele, Height) %>%
  unite(tmp, c("Marker", "Peak", "key")) %>%
  spread(tmp, value)

         Sample_File AMEL_1_Allele AMEL_1_Height AMEL_2_Allele AMEL_2_Height CSF1PO_1_Allele CSF1PO_1_Height CSF1PO_2_Allele CSF1PO_2_Height
1 A02_1710963103.fsa             X          5137             Y          4898              11             805              12             652

答案 1 :(得分:0)

请注意,tidyr的当前开发版本(0.8.3.900)包含函数pivot_wider,该函数在单个函数调用中将data.frame转换为所需的宽格式(另请参见插图)在Tidyr: Pivoting):

library(tidyr)

pivot_wider(df, names_from = c("Marker", "Peak"), values_from = c("Allele", "Height"))

#>          Sample_File Allele_AMEL_1 Allele_AMEL_2 Allele_CSF1PO_1
#> 1 A02_1710963103.fsa             X             Y              11
#>   Allele_CSF1PO_2 Height_AMEL_1 Height_AMEL_2 Height_CSF1PO_1
#> 1              12          5137          4898             805
#>   Height_CSF1PO_2
#> 1             652

packageVersion("tidyr")
#> [1] '0.8.3.9000'

数据

df <- structure(list(Sample_File = c("A02_1710963103.fsa", "A02_1710963103.fsa", 
            "A02_1710963103.fsa", "A02_1710963103.fsa"), Marker = c("AMEL", 
            "AMEL", "CSF1PO", "CSF1PO"), Peak = c(1L, 2L, 1L, 2L), Allele = c("X", 
            "Y", "11", "12"), Height = c(5137L, 4898L, 805L, 652L)), row.names = c(NA, 
        -4L), class = "data.frame")

答案 2 :(得分:0)

我个人更喜欢data.table,而不是tidyverse

dcast(df,Sample_File~Marker+Peak,value.var=c("Allele","Height"))

该命令的作用是按标记和峰(Sample_File~Marker+Peak)对样本进行分组,并用等位基因和高度(value.var=c("Allele","Height"))中的值填充表格