我有一个包含样本染色体及其频率的文件:
a
sample Chr_No frequency
sample-1 chr1: 0
sample-1 chr2: 0
sample-1 chr3: 0
sample-1 chr4: 1
sample-1 chr5: 0
sample-1 chr6: 0
sample-1 chr7: 0
sample-1 chr8: 0
sample-1 chr9: 1
sample-1 chr10 0
sample-1 chr11 0
......
我想将其转换为数据帧,所以我正在R中使用它:
b <- dcast( a, Sample ~ Chr_No, value.var = "Frequency", fill = 0 )
我如何从Chr_No中删除“:”并将Chr_No安排为Chr1 Chr2 Chr3 .......在数据框中?
答案 0 :(得分:1)
首先从名称中删除冒号,然后使用mixedsort
将名称排列为chr1
,chr2
。
library(gtools)
names(b) <- sub(":", "", names(b))
cbind(b[1], b[-1][mixedsort(names(b[-1]))])
# sample chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
#1 sample-1 0 0 0 1 0 0 0 0 1 0 0
或者我们可以将所有内容都保留在基数R中,并从names
中删除所有字符,仅保留数字并在删除冒号后order
对其进行修饰
cbind(b[1], b[-1][order(as.numeric(gsub("[[:alpha:]]", "", names(b[-1]))))])
# sample chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
#1 sample-1 0 0 0 1 0 0 0 0 1 0 0
答案 1 :(得分:0)
在order
之前dcast
的另一种选择是在删除字符串末尾的factor
之后将其更改为levels
的{{1}}列在“ Chr_No”
:
然后,执行library(data.table)
setDT(a)[, Chr_No := factor(sub(':$', '', Chr_No), levels = paste0("chr", 1:11))]
dcast
dcast( a, sample ~ Chr_No, value.var = "frequency", fill = 0 )
# sample chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
#1: sample-1 0 0 0 1 0 0 0 0 1 0 0