我有以下数据:
yvar <- c(1:150)
replication <- c( rep(c(rep(1, 10), rep(2,10), rep(3,10)),5))
genotypes <- c(rep(paste("G", 1:10, sep= ""), 15))
environments <- c(rep(paste("E",5:1, sep = ""), each = 30))
mydf1 <- data.frame (yvar, replication, genotypes, environments)
mydf1$replication <- as.factor(mydf1$replication)
我想总结一下数据:
mydf = data.frame(aggregate (yvar ~ genotypes + environments, data = mydf1, mean))
现在创建一个矩阵,希望数字,matm不是!
matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c'))
colnames(matm) <- c("genotypes", levels(mydf$environments))
genotypes E1 E2 E3 E4 E5
[1,] "G1" "131" "101" " 71" " 41" " 11"
[2,] "G10" "140" "110" " 80" " 50" " 20"
[3,] "G2" "132" "102" " 72" " 42" " 12"
[4,] "G3" "133" "103" " 73" " 43" " 13"
[5,] "G4" "134" "104" " 74" " 44" " 14"
[6,] "G5" "135" "105" " 75" " 45" " 15"
[7,] "G6" "136" "106" " 76" " 46" " 16"
[8,] "G7" "137" "107" " 77" " 47" " 17"
[9,] "G8" "138" "108" " 78" " 48" " 18"
[10,] "G9" "139" "109" " 79" " 49" " 19"
我转换为data.frame,然后
matd <- data.frame(matm)
genotypes E1 E2 E3 E4 E5
1 G1 31.70000 26.76667 23.60000 30.73333 43.13333
2 G10 32.40000 17.86667 28.83333 32.43333 30.23333
3 G2 29.50000 24.60000 24.16667 33.43333 38.66667
4 G3 27.00000 28.83333 33.63333 43.83333 29.60000
5 G4 29.53333 29.90000 26.60000 26.13333 40.33333
6 G5 27.40000 32.43333 27.96667 40.43333 41.46667
7 G6 36.76667 32.26667 28.26667 38.73333 33.43333
8 G7 29.63333 27.00000 26.96667 34.90000 40.70000
9 G8 24.50000 23.26667 22.50000 27.60000 32.26667
10 G9 31.60000 24.96667 24.46667 27.56667 36.26667
我想摆脱基因型列,然后将其转换为矩阵
matx = data.frame(matd[,-1])
matdm <- as.matrix(matx)
matdm
E1 E2 E3 E4 E5
[1,] "31.70000" "26.76667" "23.60000" "30.73333" "43.13333"
[2,] "32.40000" "17.86667" "28.83333" "32.43333" "30.23333"
[3,] "29.50000" "24.60000" "24.16667" "33.43333" "38.66667"
[4,] "27.00000" "28.83333" "33.63333" "43.83333" "29.60000"
[5,] "29.53333" "29.90000" "26.60000" "26.13333" "40.33333"
[6,] "27.40000" "32.43333" "27.96667" "40.43333" "41.46667"
[7,] "36.76667" "32.26667" "28.26667" "38.73333" "33.43333"
[8,] "29.63333" "27.00000" "26.96667" "34.90000" "40.70000"
[9,] "24.50000" "23.26667" "22.50000" "27.60000" "32.26667"
[10,] "31.60000" "24.96667" "24.46667" "27.56667" "36.26667"
我有两个问题:
(1)是否有一致的方法来制作/分配矩阵数字
(2)我可以看到基因型列名称按字母顺序排序。我的文件在列中有不同的顺序。如果这是一致的话我对这个订单很好,但是我担心以下部分:
colnames(matm) <- c("genotypes", levels(mydf$environments))
如果聚合函数和levels(mydf$environments),
有不同的顺序,它们都会按字母顺序或文件中的oder排序。
感谢您的建议。
答案 0 :(得分:5)
我想我看到了混乱的来源。稍微备份,当你进行聚合时,你想变成一个矩阵;尝试捕获并查看它:
myAgg <- aggregate(yvar ~ genotypes, mydf, 'c')
str(myAgg)
产量:
> str(myAgg)
'data.frame': 10 obs. of 2 variables:
$ genotypes: Factor w/ 10 levels "G1","G10","G2",..: 1 2 3 4 5 6 7 8 9 10
$ yvar : num [1:10, 1:5] 131 140 132 133 134 135 136 137 138 139 ...
因此聚合产生了一些有点非典型的data.frame。列yvar
实际上是您感兴趣的矩阵:
> myAgg$yvar
[,1] [,2] [,3] [,4] [,5]
[1,] 131 101 71 41 11
[2,] 140 110 80 50 20
[3,] 132 102 72 42 12
[4,] 133 103 73 43 13
[5,] 134 104 74 44 14
[6,] 135 105 75 45 15
[7,] 136 106 76 46 16
[8,] 137 107 77 47 17
[9,] 138 108 78 48 18
[10,] 139 109 79 49 19
所以你可以直接抓住它:
matdm <- myAgg$yvar
现在回答你的具体问题......
1)制作矩阵数字的一致方法是确保进入matrix()
或as.matrix()
函数的数据是数字的。当你打电话
matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c'))
你创建了一个字符矩阵,因为你有一个char列。然后,您将该矩阵转换为data.frame。这将列转换为因子。然后你选择了几个列,这些因素并不令人惊讶。所以当你打电话时
matdm <- as.matrix(matx)
将因素转换为字符。
2)由
创建的变量的顺序 aggregate(yvar ~ genotypes, mydf, 'c')
是变量genotypes
中因子顺序的函数。这些通常是按字母顺序创建的,但您可以随时查看关卡以便完全确定。如果手动创建因子,则不一定按字母顺序排列。
答案 1 :(得分:1)
这是reshape2
包的工作。这是代码
library(reshape2); library(plyr)
ans <- dcast(mydf1, genotypes ~ environments, mean, value_var = 'yvar')
ans <- mutate(ans, genotypes = sub("G", "", genotypes))
arrange(ans, genotypes)