我想根据这个例子更改此数据集:
data<- read.table (text="
V1 V2 V3 V4 V5 V6 V7 V8
Chr10_247 T C CC241=miss CC332=het CC37=ref CC88=ref CC886=het
Chr10_445 G T CC241=ref CC332=ref CC37=het CC88=ref CC886=het
Chr10_447 A C CC241=homo CC332=homo CC37=homo CC88=homo CC886=homo
Chr10_481 C T CC241=ref CC332=het CC37=het CC88=ref CC886=het
Chr10_517 G A CC241=homo CC332=het CC37=ref CC88=homo CC886=het
Chr10_637 A G CC241=het CC332=ref CC37=het CC88=het CC886=het"
stringsAsFactors = FALSE,row.names = NULL,header = T)
V1是基因组上的物理位置(Chr10_247),V2是参考基因座,V3是可选基因座,V4,V5,V6,V7,V8是个体。那么,在哪一行,我需要:
例如使用&#34; A&#34;在V2和B在V3
Change *=ref per 2*V2 A B AA - *ref
Change *=homo per 2*V3 A B BB - *homo
Change *=het per (V2*V3) A B AB - *hete
Change *=miss per NA A B NA - *miss
预期结果:
V1 V2 V3 V4 V5 V6 V7 V8
Chr10_247 T C NA TC TT TT TT
Chr10_445 G T GG GG GT GG GT
Chr10_447 A C CC CC CC CC CC
Chr10_481 C T CC CT CT CC CT
Chr10_517 G A AA GA GG AA GA
Chr10_637 A G AG AA AG AG AG
答案 0 :(得分:1)
这是解决方案的开始,这是一个相对直接格式的数据转换问题。
library(reshape2)
data <- read.table(text="
V1 V2 V3 V4 V5 V6 V7 V8
Chr10_247 T C CC241=miss CC332=het CC37=ref CC88=ref CC886=het
Chr10_445 G T CC241=ref CC332=ref CC37=het CC88=ref CC886=het
Chr10_447 A C CC241=homo CC332=homo CC37=homo CC88=homo CC886=homo
Chr10_481 C T CC241=ref CC332=het CC37=het CC88=ref CC886=het
Chr10_517 G A CC241=homo CC332=het CC37=ref CC88=homo CC886=het
Chr10_637 A G CC241=het CC332=ref CC37=het CC88=het CC886=het",
stringsAsFactors = FALSE,row.names = NULL,header = T)
#melt data for easy vectorized operations
m_data <- melt(data, id.vars=c("V1","V2","V3"),variable.name="Individual",value.name="Status")
head(m_data)
#change status to ref, miss, or homo
m_data$true_status <- gsub(".+=","",m_data$Status)
#format strings based on status
m_data$result <- with(m_data, ifelse(true_status=="miss",NA,
ifelse(true_status=="ref",
sprintf("%s%s",V2, V2),
ifelse(true_status=="homo",
sprintf("%s%s",V3,V3),
sprintf("%s%s", V2,V3)))))
#turn back to wide
res <- dcast(m_data, V1~Individual)
#merge for V2 and V3
res2 <- merge(data[,c("V1","V2","V3")],res,by="V1")
> res2
V1 V2 V3 V4 V5 V6 V7 V8
1 Chr10_247 T C <NA> TC TT TT TC
2 Chr10_445 G T GG GG GT GG GT
3 Chr10_447 A C CC CC CC CC CC
4 Chr10_481 C T CC CT CT CC CT
5 Chr10_517 G A AA GA GG AA GA
6 Chr10_637 A G AG AA AG AG AG