如果我有一个看起来像这样的表(用read.table读入):
http://i.stack.imgur.com/gaef6.png
(0是占位符,所以我有一致的行数)
有没有办法可以添加具有相同相应名称的值(a的两个值,l的所有4个值)并将其作为数据框输出?此外,行不一致(即一些列有4个,有些有2个)
结果应如下所示:
http://i.stack.imgur.com/HrCt5.png
我可以使用a,b等将数据框组合为行名称和列值相加后的列,但我无法弄清楚如何根据相应的名称对它们求和。
这就是我目前正在接近这个:
read.table()
定义空data.frame
使用循环来提取和添加相应名称的列值? < -need help with this
cbind()
我刚刚生成的数据框和列
继续循环结束
- 使用row.names()和colnames()来更改行名和列名
到目前为止我的代码:
setwd(wd)
read.table(dat.txt,sep="\t")->x
read.table(total.txt, sep="\t")->total
met<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
y<-data.frame(),
total[2,]->name
for g in 1:ncol(x){
#column will be the column with the combined values according to name
cbind(y,column)
}
row.names(y)<-met
colnames(y)<-name
write.table(y,file="data.txt",sep="\t")
非常感谢任何帮助。
答案 0 :(得分:0)
你可以分解为两部分:
(1)汇总每一列 (2)结合成一张漂亮的桌子
使用* apply可以很好地完成。我把它分解为for循环以便于阅读
## sample df
df <- structure(list(thing1.met = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e", "f", "f", "g", "g", "h", "h", "i", "i", "j", "j", "k", "k", "l", "l", "m", "n", "o", "p", "q", "r"), thing1 = c(-0.57, 0.42, -1.12, -0.5, -0.94, -1.87, -2.22, -0.33, 1.45, 0.46, -1.96, -0.35, -1.01, 0.72, 0.04, -0.21, 0.81, 1.28, -0.52, -1.19, 0.03, -1.71, 0.53, -1.96, 1.58, -0.1, -0.88, 0.92, 0.02, -0.91), thing2.met = c("a", "a", "a", "b", "b", "c", "c", "c", "d", "e", "e", "f", "f", "g", "g", "h", "h", "i", "i", "j", "j", "k", "l", "l", "m", "n", "o", "p", "q", "r"), thing2 = c(-0.06, -0.7, 0.16, 1.96, 0.78, -0.65, -0.17, 0.89, 0.68, -0.93, -1.44, -0.16, -0.52, -0.19, 1.15, -0.77, 0.69, -0.48, 1.75, 1.62, -0.68, -1.06, -1.2, 1.42, -0.2, 1.33, 2.24, 0.35, 2, -1.21), thing3.met = c("a", "a", "a", "b", "c", "c", "d", "d", "e", "e", "f", "f", "g", "g", "g", "g", "g", "i", "k", "l", "l", "l", "m", "o", "o", "o", "p", "p", "p", "q"), thing3 = c(1.27, 4.45, -2.42, -9.53, 3.33, 5.58, -2.94, 2.54, 12.44, 12.41, 7.6, 0.63, 5.67, -3.79, 12.28, 1.77, -0.4, -0.04, 0.95, 4.93, 1.77, 0.37, -2.79, 2.36, 12.76, -5.4, -4.73, -1.8, 0.52, -4.97)), .Names = c("thing1.met", "thing1", "thing2.met", "thing2", "thing3.met", "thing3"), row.names = c(NA, -30L), class = "data.frame")
# create blank data frame
results <- data.frame(row.names=met)
# grab every other column
cols <- seq(2, ncol(df), 2)
# aggregate over all m, over every pair of columns
for (m in met) {
for (c in cols) {
results[m, c/2] <- sum(df[,c][df[,c-1]==m])
}
}
# clean up column names
names(results) <- names(df)[cols]
# final output
results
示例:
> results
thing1 thing2 thing3
a -0.15 -0.60 3.30
b -1.62 2.74 -9.53
c -2.81 0.07 8.91
d -2.55 0.68 -0.40
e 1.91 -2.37 24.85
f -2.31 -0.68 8.23
g -0.29 0.96 15.53
h -0.17 -0.08 0.00
i 2.09 1.27 -0.04
j -1.71 0.94 0.00
k -1.68 -1.06 0.95
l -1.43 0.22 7.07
m 1.58 -0.20 -2.79
n -0.10 1.33 0.00
o -0.88 2.24 9.72
p 0.92 0.35 -6.01
q 0.02 2.00 -4.97
r -0.91 -1.21 0.00