例如,我有一个表格如下:(我们称之为a)
SNP ID ALLE1 ALLE2
SNPNAME1 1 A A
SNPNAME2 1 A G
SNPNAME3 1 G G
...
我想编写一个函数来从上面创建一个新表:
ID SNPNAME1 SNPNAME2 SNPNAME3...
1 AA AG GG
...
所以我的想法是首先创建一个NULL对象,然后我添加一个新的“ID”列,我可以这样做:
b$ID=NA
然后我尝试添加一个名为a[1,]$SNP
的新列,我尝试使用以下语句:
b$a[1,]$SNP=NA
但是我不能。 然后我尝试使用
b$get(a[1,]$SNP)=NA
或
c=quote(a[1,]$SNP)
b$eval(c)=NA
但以上所有内容现在都没有用。 谁能告诉我怎么做? 谢谢。
答案 0 :(得分:2)
这是一个data.table
解决方案。
library(data.table)
DT <- data.table(a)
DT[, setNames(as.list(paste0(ALLE1, ALLE2)), SNP), by = ID]
## ID SNPNAME1 SNPNAME2 SNPNAME3
## 1: 1 AA AG GG
使用Paul的数据
DT <- data.table(df)
DT[, structure(as.list(paste0(var1, var2)), names = as.character(name)), by = ID]
## ID spam1 spam2 spam3 spam4 spam5 spam6 spam7 spam8 spam9 spam10
## 1: 1 AA AA GA GG AG GA GA AG GG AA
## 2: 2 AA AA GA GA GA AA AG GG GG GG
## 3: 3 AG AG AG GA AA AG GG GA AG AA
## 4: 4 GA GG GA AA AG GG AA AA GG AG
## 5: 5 AG GA GG AG AA AG AA AA GG GA
答案 1 :(得分:1)
无需自己构建对象。首先,让我们制作一些我认为具有代表性的示例数据:
df = data.frame(name = paste('spam', rep(1:10, 5), sep = ''),
ID = rep(1:5, each = 10),
var1 = sample(c('A', 'G'), 50, replace = TRUE),
var2 = sample(c('A', 'G'), 50, replace = TRUE))
amd结合var列:
df = transform(df, comb_var = paste(var1, var2, sep = ''))
head(df)
name ID var1 var2 comb_var
1 spam1 1 A G AG
2 spam2 1 G G GG
3 spam3 1 G G GG
4 spam4 1 A G AG
5 spam5 1 A G AG
6 spam6 1 G A GA
然后使用dcast
执行转换:
library(reshape2)
dcast(df, ID ~ name, value.var = 'comb_var')
ID spam1 spam10 spam2 spam3 spam4 spam5 spam6 spam7 spam8 spam9
1 1 AG GA GG GG AG AG GA GG GA GG
2 2 AA GA AG GA GA AG AG AA GG AG
3 3 GG AG AG AG GA GG GA GA AA AG
4 4 AA AA GA GA GA GA AA GA AG AA
5 5 AG AA GA AA GG GG GG GA GG GG
答案 2 :(得分:1)
DF <- read.table(text="SNP ID ALLE1 ALLE2
SNPNAME1 1 A A
SNPNAME2 1 A G
SNPNAME3 1 G G", header=TRUE)
library(reshape2)
DFm <- melt(DF, id=c("SNP", "ID"))
dcast(DFm, ID~SNP, value.var="value", fun.aggregate=paste, collapse="")
# ID SNPNAME1 SNPNAME2 SNPNAME3
#1 1 AA AG GG
答案 3 :(得分:1)
每当我看到reshape2
答案时,我总是试着看看是否有一个相当简单的基础R解决方案。在这种情况下(使用Pauls的数据)使用tapply()
和I()
似乎以表格形式排列字符串(如果您首先阻止`转换为制作因素):
df = transform(df, comb_var = paste(var1, var2, sep = ''),stringsAsFactors=FALSE)
with(df, tapply(comb_var, list(ID, name), I))
#--------------------
spam1 spam10 spam2 spam3 spam4 spam5 spam6 spam7 spam8 spam9
1 "AA" "GA" "GG" "GG" "GA" "AG" "AA" "GG" "GG" "AG"
2 "AA" "AG" "AA" "AA" "AA" "GG" "GG" "AA" "AA" "GG"
3 "GA" "GA" "GA" "AG" "AA" "AG" "GA" "GG" "AG" "AG"
4 "GG" "AA" "GG" "GG" "AA" "GA" "GA" "GG" "AA" "AA"
5 "AG" "GA" "AG" "GG" "GA" "GA" "AG" "AA" "GG" "GG"