我有一个复杂的问题,我不知道该怎么办。我有两个名为df1
的数据框:
structure(list(State = structure(1:2, .Label = c("Aaa", "Dd"), class = "factor"),
City = structure(1:2, .Label = c("bb", "e"), class = "factor"),
Type1 = c(NA, NA), Type2 = c(NA, NA)), .Names = c("State",
"City", "Type1", "Type2"), class = "data.frame", row.names = c(NA,
-2L))
和df2
:
structure(list(state = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L
), .Label = c("Aaa", "Dd"), class = "factor"), city = structure(c(1L,
2L, 3L, 4L, 4L, 5L, 6L), .Label = c("bb", "ccc", "ddd", "fff",
"ggg", "hh"), class = "factor"), type = structure(c(1L, 2L, 2L,
2L, 2L, 2L, 3L), .Label = c("Type 1", "Type 2", "Type 4"), class = "factor"),
value = 1:7), .Names = c("state", "city", "type", "value"
), class = "data.frame", row.names = c(NA, -7L))
Dataframe df1
如下所示:
State City Type1 Type2
Aaa bb NA NA
Dd e NA NA
和dataframe df2
如下所示:
state city type value
Aaa bb Type 1 1
Aaa ccc Type 2 2
Aaa ddd Type 2 3
Dd fff Type 2 4
Dd fff Type 2 5
Dd ggg Type 2 6
Dd hh Type 4 7
对于NA
中的df1
,我需要根据以下规则从df2
查找值:
1)如果只有单个实例,其中State
= state
和City
= city
对于给定type
在df2
中,将value
插入相应的df1
列Type1
或Type2
2)如果多个实例,其中State
= state
和City
= city
对于给定的type
,我需要对所有value
求平均值并将其插入df1
3)如果给定State
无实例,其中state
= City
和city
= type
,我需要获取此state
的所有type
的平均值并插入df1
4)如果给定的State
无实例其中state
= type
,则该值应保留NA
{ {1}}
只是为了澄清 - 基本上我想尽可能将值df1
和Type1
平均为“已解决”。换句话说,我希望在可能的情况下使用Type2
级别的平均值,但如果不可能,那么我想使用City
级别平均值。不过,我想要回复State
中概述的原始State
和City
的平均值(即使df1
平均值都可用
我知道这很复杂!我想要的结果是
State
这是一个数据框,如:
structure(list(State = structure(1:2, .Label = c("Aaa", "Dd"), class = "factor"),
City = structure(1:2, .Label = c("bb", "e"), class = "factor"),
Type1 = c(1L, NA), Type2 = c(2.5, 5)), .Names = c("State",
"City", "Type1", "Type2"), class = "data.frame", row.names = c(NA,
-2L))
我甚至不知道从哪里开始解决这个问题。我的第一个想法是,我需要使用State City Type1 Type2
Aaa bb 1 2.5
Dd e NA 5.0
来重塑acast
。例如,我可以使用
df2
它将数据重新整形为更接近acast(df2, state+city+value~type)
,但随后我松开了一些我需要保留的列(这些列被压缩到rowname中)。我甚至不知道如何开始搜索df1
和City
的挑战,然后根据这些结果进行平均。
有人能指出我正确的方向吗?
编辑(2015年1月):我在下面的特洛伊回答下面添加了一条新评论,询问是否有一种简单的方法可以添加一个列,用于确定计算均值的级别(城市或州) 。我找到了一个解决方案,虽然可能有更好的方法,但它对我有用。希望这有助于某人!
State
然后
getlevel<-function(state,city,type){
m<-means[means$state==state & means$city==city & means$type==type, "mean"]
sm<-state_means[state_means$state==state & state_means$type==type, "mean"]
ifelse(length(m)>0,"city","state")
}
答案 0 :(得分:1)
编辑 - 抱歉误解了这些问题:以下是符合您条件的更正代码:
require(plyr)
means<-ddply(df2,.(state,city,type),summarize,mean=mean(value))
state_means<-ddply(df2,.(state,type),summarize,mean=mean(value))
getval<-function(state,city,type){
m<-means[means$state==state & means$city==city & means$type==type, "mean"]
sm<-state_means[state_means$state==state & state_means$type==type, "mean"]
ifelse(length(m)>0,m,sm)
}
## this gives you the new df1
ddply(df1,.(State,City),transform,Type1=getval(as.character(State),as.character(City),"Type 1"),Type2=getval(as.character(State),as.character(City),"Type 2"))
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 以前的答案(不完整)
这有点困难,因为你的结构调用对于df2不能正常工作,你的示例数据集不会给你预期结果中的所有数据,但我认为你想要的是:
require(plyr)
means<-ddply(df2,.(state,city,type),summarize,mean=mean(value))
getval<-function(state,city,type){means[means$state==state & means$city==city & means$type==type, "mean"]}
## this gives you the new df1
ddply(df1,.(State,City),transform,Type1=getval(as.character(State),as.character(City),"Type 1"),Type2=getval(as.character(State),as.character(City),"Type 2"))
############################################################################X
## what's happening in detail:
require(plyr) # calls the plyr library
means<-ddply(df2, # base on df2
.(state,city,type), # summarize by combination of city/state/type
summarize, # tells plyr to summarize rather than transform
mean=mean(value)) # show one column at each summary level, called 'mean', the average val
getval<-function(state,city,type){ # create function called getval, takes 3 parameters
means[means$state==state & # first part of [X,]
means$city==city & # selects the row that matches all criteria
means$type==type,
"mean"]} # and [,X] the column relating to the type
getval("Aaa","bb","Type 2")
# this gives you the new df2
ddply(df1, # base on df1
.(State,City), # summarize by State & City
transform, # tell plyr to transform existingn set rather than roll up
Type1=getval(as.character(State),as.character(City),"Type 1"), # call getval() for Type 1
Type2=getval(as.character(State),as.character(City),"Type 2")) # and for Type 2
为您提供以下内容(不是您的预期结果,但数据隐含的内容)
State City Type1 Type2
1 Aaa bb 1 NA
2 Dd e NA NA
答案 1 :(得分:0)
首先重塑df2中的数据,然后使用data.table
的密钥合适地合并数据:
library(data.table)
library(reshape2)
dt1 <- as.data.table(df1)
dt2 <- as.data.table(df2)
Type
列
dt2.casted <- reshape2::dcast(dt2, state + city ~ type
, fill=NA_real_
, fun.aggregate=mean, na.rm=TRUE)
dt2.casted <- as.data.table(dt2.casted)
keys
以便合并setkey(dt2.casted, state, city)
setkey(dt1, State, City)
NA
s dt1[dt2.casted][, lapply(.SD, mean, na.rm=TRUE), by=State, .SDcols=grep("Type", names(dt2.casted), value=TRUE)]
State Type 1 Type 2 Type 4
1: Aaa 1 2.50 NaN
2: Dd NaN 5.25 7
dt2.casted <- reshape2::dcast(dt2, state ~ type
, fill=NA_real_
, fun.aggregate=mean, na.rm=TRUE)
dt2.casted <- as.data.table(dt2.casted)
setkey(dt2.casted, state)
setkey(dt1, State)
dt1[dt2.casted][, lapply(.SD, mean, na.rm=TRUE)
, by=list(State, City)
, .SDcols=grep("Type"
, names(dt2.casted), value=TRUE)
]
State City Type 1 Type 2 Type 4
1: Aaa bb 1 2.5 NaN
2: Dd e NaN 5.0 7