我有一个很重要的列表,但微观示例如下:
A <- c("A", "a", "A", "a", "A")
B <- c("A", "A", "a", "a", "a")
C <- c(1, 2, 3, 1, 4)
mylist <- list(A=A, B=B, C= C)
预期输出是将A与B合并,以便每个组件看起来像AB
AA, aA, Aa, aa, Aa
最好应该排序,大写总是第一个
AA, Aa, Aa, aa, Aa
因此,新列表或矩阵应该有两列或多行:
AA, Aa, Aa, aa, Aa
1, 2, 3, 1, 4
现在我想根据类来计算C的平均值 - “AA”,“Aa”和“aa”
看起来很简单,但我无法轻易弄明白。
答案 0 :(得分:2)
> (ab <- paste(A, B, sep="") )
[1] "AA" "aA" "Aa" "aa" "Aa"
> (ab <- paste(A, B, sep="") ) # the joining step
[1] "AA" "aA" "Aa" "aa" "Aa"
> (ab <- sub("([a-z])([A-Z])", "\\2\\1", ab) ) # swap lowercase uppercase
[1] "AA" "Aa" "Aa" "aa" "Aa"
> rbind(ab, C) # matrix
[,1] [,2] [,3] [,4] [,5]
ab "AA" "Aa" "Aa" "aa" "Aa"
C "1" "2" "3" "1" "4"
> data.frame(alleles=ab, count=C) # dataframes are lists
alleles count
1 AA 1
2 Aa 2
3 Aa 3
4 aa 1
5 Aa 4
答案 1 :(得分:2)
如果您使用包data.frame
plyr
中,我就可以这样做
> A <- c("A", "a", "A", "a", "A")
> B <- c("A", "A", "a", "a", "a")
> C <- c(1, 2, 3, 1, 4)
> groups <- sort(paste(A, B, sep=""))
[1] "AA" "aA" "Aa" "aa" "Aa"
> my.df <- data.frame(A=A, B=B, C=C, group=groups)
> require(plyr)
> result <- ddply(my.df, "group", transform, group.means=mean(C))
> result[order(result$group, decreasing=TRUE),]
A B C group group.means
5 A A 1 AA 1.0
3 A a 3 Aa 3.5
4 A a 4 Aa 3.5
2 a A 2 aA 2.0
1 a a 1 aa 1.0
答案 2 :(得分:1)
使用您的数据:
A <- c("A", "a", "A", "a", "A")
B <- c("A", "A", "a", "a", "a")
C <- c(1, 2, 3, 1, 4)
我使用A和B的组合作为关键列定义data.frame
:
AB <- paste(A, B, sep='')
df <- data.frame(id=AB, C=C)
> df
id C
1 AA 1
2 aA 2
3 Aa 3
4 aa 1
5 Aa 4
如果您需要在汇总前订购此data.frame
,请:
df <- df[order(AB, decreasing=TRUE),]
> df
id C
1 AA 1
3 Aa 3
5 Aa 4
2 aA 2
4 aa 1
使用aggregate
计算每个id
的平均值:
meanDF <- aggregate(C~id, data=df, mean)
> meanDF
id C
1 aa 1.0
2 aA 2.0
3 Aa 3.5
4 AA 1.0
但是如果你想在聚合后订购,那么:
df <- data.frame(id=AB, C=C)
meanDF <- aggregate(C~id, data=df, mean)
meanDF <- meanDF[order(meanDF$id, decreasing=TRUE),]