我是R的初学者,因此如果不能说一口流利,我深表歉意。我想散布数据,以便一个Sample_file的所有变量都在一行中。我的数据(RW_leftjoin)当前看起来像这样:
Sample_File Marker Peak Allele Height
1: A02_1710963103.fsa AMEL 1 X 5137
2: A02_1710963103.fsa AMEL 2 Y 4898
3: A02_1710963103.fsa CSF1PO 1 11 805
4: A02_1710963103.fsa CSF1PO 2 12 652
我希望我的数据看起来像这样:
Sample_File AMEL1 AMEL2 Height1 Height2 CSF1PO1 CSF1PO1 Height1 Height2
1: A02_1710963103.fsa X Y 5137 4898 11 12 805 652
使用R可以吗?
我尝试使用此功能:
RW_spread <- RW_leftjoin %>%
rowid_to_column() %>%
group_by(Sample_File, Marker) %>%
mutate(ID = paste0(Marker, Peak)) %>%
ungroup() %>%
spread(ID, Allele)
但是数据看起来像这样:
rowid Sample_File Marker Peak Height AMEL1 AMEL2 CSF1PO1 CSF1PO2
<int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1 A02_1710963103.fsa AMEL 1 5137 X NA NA NA
2 2 A02_1710963103.fsa AMEL 2 4898 NA Y NA NA
3 3 A02_1710963103.fsa CSF1PO 1 805 NA NA 11 NA
4 4 A02_1710963103.fsa CSF1PO 2 652 NA NA NA 12
将对此表示感谢。
答案 0 :(得分:2)
一种tidyr
方法是将gather
和Allele
变量Height
放入一个单独的列中,并用其余的列创建一个关键变量(不包括用作id),先使用unite
,然后spread
设置键/值对。
library(tidyr)
RW_leftjoin %>%
gather(key, value, Allele, Height) %>%
unite(tmp, c("Marker", "Peak", "key")) %>%
spread(tmp, value)
Sample_File AMEL_1_Allele AMEL_1_Height AMEL_2_Allele AMEL_2_Height CSF1PO_1_Allele CSF1PO_1_Height CSF1PO_2_Allele CSF1PO_2_Height
1 A02_1710963103.fsa X 5137 Y 4898 11 805 12 652
答案 1 :(得分:0)
请注意,tidyr
的当前开发版本(0.8.3.900)包含函数pivot_wider
,该函数在单个函数调用中将data.frame转换为所需的宽格式(另请参见插图)在Tidyr: Pivoting):
library(tidyr)
pivot_wider(df, names_from = c("Marker", "Peak"), values_from = c("Allele", "Height"))
#> Sample_File Allele_AMEL_1 Allele_AMEL_2 Allele_CSF1PO_1
#> 1 A02_1710963103.fsa X Y 11
#> Allele_CSF1PO_2 Height_AMEL_1 Height_AMEL_2 Height_CSF1PO_1
#> 1 12 5137 4898 805
#> Height_CSF1PO_2
#> 1 652
packageVersion("tidyr")
#> [1] '0.8.3.9000'
数据
df <- structure(list(Sample_File = c("A02_1710963103.fsa", "A02_1710963103.fsa",
"A02_1710963103.fsa", "A02_1710963103.fsa"), Marker = c("AMEL",
"AMEL", "CSF1PO", "CSF1PO"), Peak = c(1L, 2L, 1L, 2L), Allele = c("X",
"Y", "11", "12"), Height = c(5137L, 4898L, 805L, 652L)), row.names = c(NA,
-4L), class = "data.frame")
答案 2 :(得分:0)
我个人更喜欢data.table
,而不是tidyverse
,
dcast(df,Sample_File~Marker+Peak,value.var=c("Allele","Height"))
该命令的作用是按标记和峰(Sample_File~Marker+Peak
)对样本进行分组,并用等位基因和高度(value.var=c("Allele","Height")
)中的值填充表格