R子集数据按每行上的多个条件

时间:2016-08-17 09:44:14

标签: r dataframe subset

我有一个非常大的数据集,其中包含许多列和行。并非每个同事都能看到所有数据。根据数据框Data_locatie,我想将原始数据框DF进行子集化。列acces告诉我同事是否可以看到此组合是(= 1)或否(= 0)。我制作了一个可以重现的例子,你可以使用它。

CityChargeSessions <-c("Amsterdam","Amsterdam","Amsterdam","Amsterdam","Beverwaard","De meern","De Meern","De Meern","Den Haag","Den Haag")
RegionAbbreviation  <- c("G4", "G4","G4","G4","G4","G4","G4","G4","G4","G4")
Provider<- c("ALLEGO","Essent","EVBOX","Nuon","EVBOX","EVnet","Ballast Nedam", "Nuon","Alfen","EVnet")
acces<- c(0,1,1,0,1,1,0,0,1,0)

Data_locatie<- data.frame(CityChargeSessions,RegionAbbreviation,Provider,acces)

CityChargeSessions <-c("Amsterdam" ,"Amsterdam" ,"Den Haag" , "Den Haag"  ,"Rotterdam", "Rotterdam", "Rotterdam", "Utrecht"  , "Utrecht"  )
RegionAbbreviation  <- c("G4", "G4","G4","G4","G4","G4","G4","G4","G4")
Provider <- c("Essent","Nuon","Alfen","EVnet","Alfen","EVBOX", "EVnet","Ballast Nedam", "EVnet")
kWh<- c(3366231.03, 7547896.10, 2535700.80,  245951.82,   62004.86, 3074192.86,  221362.13, 1272956.51,  281451.94)

DF<- data.frame(CityChargeSessions,RegionAbbreviation,Provider,kWh)

我的预期输出是:

CityChargeSessions <-c("Amsterdam" ,"Den Haag")
RegionAbbreviation  <- c("G4", "G4")
Provider <- c("Essent","Alfen ")
kWh<- c(3366231.03, 2535700.80)


expected_output<- data.frame(CityChargeSessions,RegionAbbreviation,Provider,kWh)
你可以帮帮我吗?

感谢您的帮助!

马亭

1 个答案:

答案 0 :(得分:1)

您可以使用数据表并执行以下操作:

require(data.table)    
setDT(Data_locatie)
setkey(Data_locatie, "CityChargeSessions", "RegionAbbreviation", "Provider")
setDT(DF)
setkey(DF, "CityChargeSessions", "RegionAbbreviation", "Provider")

allowed_combinations <- DF[Data_locatie[acces==1], nomatch=0][, acces:=NULL]
not_allowed_combinations <- DF[Data_locatie[acces==0], nomatch=0][, acces:=NULL]