这可能相对简单。我有一个巨大的数据框,如下所示:
df1 <- structure(list(place = structure(c(1L, 5L, 1L, 4L), .Label = c("1","2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23","24", "25", "26"), class = "factor"), x = structure(list(c("A", "B", "C", "D", "E"), c("D", "E", "F","G", "H", "I"), c("D", "E", "F", "G", "H"), c("F", "H")), class = "AsIs")), .Names = c("place", "x"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame")
> df1
place x
1 1 A, B, C,....
2 5 D, E, F,....
3 1 D, E, F,....
4 4 F, H
和另一个具有df1
中每个列表元素的相应值的文件:
df2 <- structure(list(x = c('A','B','C','D','E','F','G','H','I','J','K','L','M'), value = c("5.2", "1.8", "2.7","3.8", "5.0","3.2", "4.5","2.4", "3.9", "1.2","2.3","4.3", "3.0")), .Names = c("x", "value"), row.names = c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L, 11L, 12L, 13L), class = "data.frame")
x value
1 A 5.2
2 B 1.8
3 C 2.7
4 D 3.8
5 E 5.0
6 F 3.2
7 G 4.5
8 H 2.4
9 I 3.9
10 J 1.2
11 K 2.3
12 L 4.3
13 M 3.0
我想将df1
中的元素替换为df2
中的相应值(因此,A
中的每个df1
应为5.2
,依此类推)然后使用这些值执行操作,例如每个地点x
的平均值。谢谢!
答案 0 :(得分:2)
如果数据集较大,则可以使用qdap的lookup
函数进行环境查找:
library(qdap)
lapply(df1[, 2], lookup, df2)
或获得手段
df2$value <- as.numeric(df2$value) #convert your df2 value column to numeric
sapply(df1[, 2], function(x) mean(lookup(x, df2)))
答案 1 :(得分:1)
您可以使用match
和sapply
:
df1$x <- sapply(df1$x, function(x) df2$value[match(x, df2$x)])
df1$x
# [[1]]
# [1] "5.2" "1.8" "2.7" "3.8" "5.0"
#
# [[2]]
# [1] "3.8" "5.0" "3.2" "4.5" "2.4" "3.9"
#
# [[3]]
# [1] "3.8" "5.0" "3.2" "4.5" "2.4"
#
# [[4]]
# [1] "3.2" "2.4"
每条评论:
要平均每一行,您可以再次使用sapply
:
sapply(df1$x, mean)
或者一步到位:
sapply(df1$x, function(x) mean(df2$value[match(x, df2$x)]))