我有一个数据帧df:
colour shape
'red' circle
'blue' square
'blue' circle
'green' sphere
带有命名行/列的双矩阵m
circle square sphere
red 1 4 7
blue 2 5 8
green 3 6 9
我想在DF中添加一个新列,以便获得:
id colour shape
1 'red' circle
5 'blue' square
2 'blue' circle
9 'green' sphere
我尝试使用以下代码执行此操作,但它似乎不起作用:
df$id <- m[df$colour,df$shape]
我也尝试过apply();和类似但没有运气。没有人使用循环,有人能告诉我这样做的正确方法吗?
答案 0 :(得分:7)
一个相当简单(快速!)的替代方法是使用矩阵索引到矩阵中:
# Your data
d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere'))
m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
# Create index matrix - each row is a row/col index
i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
# Now use it and add as the id column...
d2 <- cbind(id=m[i], d)
d2
# id color shape
#1 1 red circle
#2 5 blue square
#3 2 blue circle
#4 9 green sphere
match
函数用于查找特定字符串的相应数字索引。
请注意,在较新版本的R(我认为是2.13和更新版本)中,您可以在索引矩阵中使用字符串。不幸的是,颜色和形状列通常为factors
,而cbind
不喜欢它(它使用整数代码),因此您需要使用as.character
强制它们:
i <- cbind(as.character(d$color), as.character(d$shape))
......我怀疑使用match
更有效率。
编辑我测量过,使用match
的速度似乎快了约20%:
# Make 1 million rows
d <- d[sample.int(nrow(d), 1e6, TRUE), ]
system.time({
i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
d2 <- cbind(id=m[i], d)
}) # 0.46 secs
system.time({
i <- cbind(as.character(d$color), as.character(d$shape))
d2 <- cbind(id=m[i], d)
}) # 0.55 secs
答案 1 :(得分:5)
我想我可能会在这里赢得最短的答案竞赛,只要这些是特征向量而不是可能更期望的因素,除非你做出特别努力避免。它实际上只添加cbind
来将两个df“字符”向量转换为[.matrix
函数所期望的两列矩阵,而您在使用时非常接近成功。 (它似乎也有相当的表现力。)
# Data construct
d <- data.frame(color=c('red','blue','blue','green'),
shape=c('circle','square','circle','sphere'), stringsAsFactors=FALSE)
m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
# Code:
d$id <- with( d, m [ cbind(color, shape) ] )
d
color shape id
1 red circle 1
2 blue square 5
3 blue circle 2
4 green sphere 9
答案 2 :(得分:2)
另一个答案使用 reshape2 和 plyr (可选择仅用于加入)包。
require(plyr)
require(reshape2)
Df <- data.frame(colour = c("red", "blue", "blue", "green"),
shape = c("circle", "square", "circle", "sphere"))
Mat <- matrix(1:9, dimnames = list(c("red", "blue", "green"),
c("circle", "square", "sphere")),
nrow = 3)
Df2 <- melt.array(Mat, varnames = c("colour", "shape"))
join(Df, Df2)
result <- join(Df, Df2)
join(Df, Df2)
Joining by: colour, shape
colour shape value
1 red circle 1
2 blue square 5
3 blue circle 2
4 green sphere 9
希望这个帮助
答案 3 :(得分:1)
merge()
是你的朋友。要使用它,我们需要一个适当的数据框来合并包含ID矩阵的堆叠版本。我使用以下代码创建newdf
:
df <- data.frame(matrix(1:9, ncol = 3))
colnames(df) <- c("circle","square","sphere")
rownames(df) <- c("red","blue","green")
newdf <- cbind.data.frame(ID = unlist(df),
expand.grid(colour = rownames(df),
shape = colnames(df)))
结果是:
> newdf
ID colour shape
circle1 1 red circle
circle2 2 blue circle
circle3 3 green circle
square1 4 red square
square2 5 blue square
square3 6 green square
sphere1 7 red sphere
sphere2 8 blue sphere
sphere3 9 green sphere
然后使用对象df2
中的原始数据,使用
df2 <- data.frame(colour = c("red","blue","blue","green"),
shape = c("circle","square","circle","sphere"))
使用merge()
> merge(newdf, df2, sort = FALSE)
colour shape ID
1 red circle 1
2 blue circle 2
3 blue square 5
4 green sphere 9
如果您需要,可以存储并重新排列列:
> res <- merge(newdf, df2, sort = FALSE)
> res <- res[,c(3,1,2)]
> res
ID colour shape
1 1 red circle
2 2 blue circle
3 5 blue square
4 9 green sphere
答案 4 :(得分:1)
您还可以将矩阵m转换为矢量,然后将ID与颜色和形状值匹配:
df<-data.frame(colour=c("red","blue","blue","green"),
shape=c("circle","square","circle","sphere"))
m<-matrix(1:9,nrow=3,dimnames=list(c("red","blue","green"),
c("circle","square","sphere")))
mVec<-as.vector(m)
下一步将df中的颜色与m矩阵中的相应dimname匹配,然后添加与形状对应的整数。 m向量的索引中的结果与相应的ID。
df$ID<-mVec[match(df$colour, dimnames(m)[[1]]) + (dim(m)[1]*
(match(df$shape, dimnames(m)[[2]]) - 1))]
答案 5 :(得分:0)
#recreating your data
dat <- read.table(text="colour shape
'red' circle
'blue' square
'blue' circle
'green' sphere", header=TRUE)
d2 <- matrix(c(1:9), ncol=3, nrow=3, byrow=TRUE)
dimnames(d2) <-list(c('circle', 'square', 'sphere'),
c("red", "blue", "green"))
d2<-as.table(d2)
#make a list of matching to the row and column names of the look up matrix
LIST <- list(match(dat[, 2], rownames(d2)), match(dat[, 1], colnames(d2)))
#use sapply to index the lookup matrix using the row and col values from LIST
id <- sapply(seq_along(LIST[[1]]), function(i) d2[LIST[[1]][i], LIST[[2]][i]])
#put it all back together
data.frame(id=id, dat)