我遇到了一个非常具有挑战性的问题,并且不知道如何处理它。 (我甚至不确定我是否正确地命名了该帖子。)无论如何,我有两个数据帧df1
和df2
。
df1 <- structure(list(country = structure(c(1L, 1L, 2L, 3L), .Label = c("a",
"b", "c"), class = "factor"), state = structure(1:4, .Label = c("d",
"m", "o", "q"), class = "factor"), city = structure(1:4, .Label = c("h",
"n", "p", "r"), class = "factor"), value = c(1L, 3L, 3L, 4L),
source = structure(1:4, .Label = c("string1", "string2",
"string3", "string4"), class = "factor")), .Names = c("country",
"state", "city", "value", "source"), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(country = structure(c(1L, 1L, 2L, 3L), .Label = c("a",
"b", "c"), class = "factor"), state = structure(1:4, .Label = c("d",
"e", "f", "g"), class = "factor"), city = structure(1:4, .Label = c("h",
"i", "j", "k"), class = "factor"), mean_value = 1:4, level_of_mean = structure(c(1L,
2L, 2L, 2L), .Label = c("city", "country"), class = "factor")), .Names = c("country",
"state", "city", "mean_value", "level_of_mean"), class = "data.frame", row.names = c(NA,
-4L))
两个数据框都包含各个国家/地区,州和城市的数据。数据帧df1
包含&#34; raw&#34;数据,df2
包含根据数据可用性(城市级别,州级别或国家级别均值)从df1
各个级别(国家,州和城市)的值计算的数据,按照优先顺序)。
我需要做的是:对于mean_value
中的每个df2
,我需要使用关联的level_of_mean
,country
,state
和要city
查看df1
并使用country
,state
和city
source
,请构建列source <- structure(1:4, .Label = c("string1", "string2", "string3", "string4"
), class = "factor")
中的字符串列表。对于上面的数据帧,这将产生以下结果:
mean_value
有没有人知道如何处理这个问题,坦白说我甚至不确定从哪里开始!
编辑:我还应该注意到我的真实&#34;数据框包含许多不同的level_of_mean
和{{1}}列,因此一般解决方案最佳。
答案 0 :(得分:0)
library(dplyr)
library(tidyr)
df2 %>%
select(-mean_value) %>%
gather(level, value, -level_of_mean) %>%
filter(as.character(level_of_mean) == as.character(level)) %>%
select(-level_of_mean) %>%
inner_join(df1 %>%
gather(level, value, -source)
) %>%
select(source) %>%
distinct() %>%
unlist(use.names=F) %>%
as.character()
您会看到一些关于因素水平在国家/地区,州和城市之间不一致的警告,您可以放心地忽略或禁止它们。
如果您不熟悉连锁或管道运营商%>%
,则由dplyr
实施。基本上x %>% f(y)
与f(x, y)
但如果没有那个操作员,你可以做同样的事情:
df2 <- select(df2, -mean_value)
df2 <- gather(df2, level, value, -level_of_mean)
df2 <- filter(df2, ...
等
这些是dplyr
个函数。如果您愿意,可以使用reshape2::melt
代替gather
,merge
代替inner_join
,unique
代替distinct
执行相同的操作