我有一个非常大的数据集,其中包含许多列和行。并非每个同事都能看到所有数据。根据数据框Data_locatie
,我想将原始数据框DF
进行子集化。列acces
告诉我同事是否可以看到此组合是(= 1)或否(= 0)。我制作了一个可以重现的例子,你可以使用它。
CityChargeSessions <-c("Amsterdam","Amsterdam","Amsterdam","Amsterdam","Beverwaard","De meern","De Meern","De Meern","Den Haag","Den Haag")
RegionAbbreviation <- c("G4", "G4","G4","G4","G4","G4","G4","G4","G4","G4")
Provider<- c("ALLEGO","Essent","EVBOX","Nuon","EVBOX","EVnet","Ballast Nedam", "Nuon","Alfen","EVnet")
acces<- c(0,1,1,0,1,1,0,0,1,0)
Data_locatie<- data.frame(CityChargeSessions,RegionAbbreviation,Provider,acces)
CityChargeSessions <-c("Amsterdam" ,"Amsterdam" ,"Den Haag" , "Den Haag" ,"Rotterdam", "Rotterdam", "Rotterdam", "Utrecht" , "Utrecht" )
RegionAbbreviation <- c("G4", "G4","G4","G4","G4","G4","G4","G4","G4")
Provider <- c("Essent","Nuon","Alfen","EVnet","Alfen","EVBOX", "EVnet","Ballast Nedam", "EVnet")
kWh<- c(3366231.03, 7547896.10, 2535700.80, 245951.82, 62004.86, 3074192.86, 221362.13, 1272956.51, 281451.94)
DF<- data.frame(CityChargeSessions,RegionAbbreviation,Provider,kWh)
我的预期输出是:
CityChargeSessions <-c("Amsterdam" ,"Den Haag")
RegionAbbreviation <- c("G4", "G4")
Provider <- c("Essent","Alfen ")
kWh<- c(3366231.03, 2535700.80)
expected_output<- data.frame(CityChargeSessions,RegionAbbreviation,Provider,kWh)
你可以帮帮我吗?
感谢您的帮助!
马亭
答案 0 :(得分:1)
您可以使用数据表并执行以下操作:
require(data.table)
setDT(Data_locatie)
setkey(Data_locatie, "CityChargeSessions", "RegionAbbreviation", "Provider")
setDT(DF)
setkey(DF, "CityChargeSessions", "RegionAbbreviation", "Provider")
allowed_combinations <- DF[Data_locatie[acces==1], nomatch=0][, acces:=NULL]
not_allowed_combinations <- DF[Data_locatie[acces==0], nomatch=0][, acces:=NULL]