创建一个新列,其名称来自表的字段

时间:2014-02-10 15:17:32

标签: r bioinformatics

例如,我有一个表格如下:(我们称之为a)

SNP         ID    ALLE1    ALLE2
SNPNAME1    1      A        A
SNPNAME2    1      A        G
SNPNAME3    1      G        G
...

我想编写一个函数来从上面创建一个新表:

ID SNPNAME1  SNPNAME2  SNPNAME3...
1    AA        AG         GG
...

所以我的想法是首先创建一个NULL对象,然后我添加一个新的“ID”列,我可以这样做:

b$ID=NA

然后我尝试添加一个名为a[1,]$SNP的新列,我尝试使用以下语句:

b$a[1,]$SNP=NA

但是我不能。 然后我尝试使用

b$get(a[1,]$SNP)=NA

c=quote(a[1,]$SNP)
b$eval(c)=NA

但以上所有内容现在都没有用。 谁能告诉我怎么做? 谢谢。

4 个答案:

答案 0 :(得分:2)

这是一个data.table解决方案。

library(data.table)

DT <- data.table(a)
DT[, setNames(as.list(paste0(ALLE1, ALLE2)), SNP), by = ID]

##    ID SNPNAME1 SNPNAME2 SNPNAME3
## 1:  1       AA       AG       GG

使用Paul的数据

DT <- data.table(df)
DT[, structure(as.list(paste0(var1, var2)), names = as.character(name)), by = ID]

##    ID spam1 spam2 spam3 spam4 spam5 spam6 spam7 spam8 spam9 spam10
## 1:  1    AA    AA    GA    GG    AG    GA    GA    AG    GG     AA
## 2:  2    AA    AA    GA    GA    GA    AA    AG    GG    GG     GG
## 3:  3    AG    AG    AG    GA    AA    AG    GG    GA    AG     AA
## 4:  4    GA    GG    GA    AA    AG    GG    AA    AA    GG     AG
## 5:  5    AG    GA    GG    AG    AA    AG    AA    AA    GG     GA

答案 1 :(得分:1)

无需自己构建对象。首先,让我们制作一些我认为具有代表性的示例数据:

df = data.frame(name = paste('spam', rep(1:10, 5), sep = ''), 
                ID = rep(1:5, each = 10),
                var1 = sample(c('A', 'G'), 50, replace = TRUE),
                var2 = sample(c('A', 'G'), 50, replace = TRUE))

amd结合var列:

df = transform(df, comb_var = paste(var1, var2, sep = ''))
head(df)
   name ID var1 var2 comb_var
1 spam1  1    A    G       AG
2 spam2  1    G    G       GG
3 spam3  1    G    G       GG
4 spam4  1    A    G       AG
5 spam5  1    A    G       AG
6 spam6  1    G    A       GA

然后使用dcast执行转换:

library(reshape2)
dcast(df, ID ~ name, value.var = 'comb_var')
  ID spam1 spam10 spam2 spam3 spam4 spam5 spam6 spam7 spam8 spam9
1  1    AG     GA    GG    GG    AG    AG    GA    GG    GA    GG
2  2    AA     GA    AG    GA    GA    AG    AG    AA    GG    AG
3  3    GG     AG    AG    AG    GA    GG    GA    GA    AA    AG
4  4    AA     AA    GA    GA    GA    GA    AA    GA    AG    AA
5  5    AG     AA    GA    AA    GG    GG    GG    GA    GG    GG

答案 2 :(得分:1)

DF <- read.table(text="SNP         ID    ALLE1    ALLE2
SNPNAME1    1      A        A
SNPNAME2    1      A        G
SNPNAME3    1      G        G", header=TRUE)

library(reshape2)

DFm <- melt(DF, id=c("SNP", "ID"))
dcast(DFm, ID~SNP, value.var="value", fun.aggregate=paste, collapse="")
#  ID SNPNAME1 SNPNAME2 SNPNAME3
#1  1       AA       AG       GG

答案 3 :(得分:1)

每当我看到reshape2答案时,我总是试着看看是否有一个相当简单的基础R解决方案。在这种情况下(使用Pauls的数据)使用tapply()I()似乎以表格形式排列字符串(如果您首先阻止`转换为制作因素):

 df = transform(df, comb_var = paste(var1, var2, sep = ''),stringsAsFactors=FALSE)
 with(df, tapply(comb_var, list(ID, name), I))
#--------------------
  spam1 spam10 spam2 spam3 spam4 spam5 spam6 spam7 spam8 spam9
1 "AA"  "GA"   "GG"  "GG"  "GA"  "AG"  "AA"  "GG"  "GG"  "AG" 
2 "AA"  "AG"   "AA"  "AA"  "AA"  "GG"  "GG"  "AA"  "AA"  "GG" 
3 "GA"  "GA"   "GA"  "AG"  "AA"  "AG"  "GA"  "GG"  "AG"  "AG" 
4 "GG"  "AA"   "GG"  "GG"  "AA"  "GA"  "GA"  "GG"  "AA"  "AA" 
5 "AG"  "GA"   "AG"  "GG"  "GA"  "GA"  "AG"  "AA"  "GG"  "GG"