从数据框中排除变量组合

时间:2015-11-27 14:39:33

标签: r

我有两个数据帧,第一个是所有变量组合的列表(稍后将在我将循环的代码中)

DataframeA示例:

 time_period    store_type  category_type
 month          store       lvl1
 month          store       lvl2
 month          format      lvl1
 month          format      lvl2
 week           store       lvl1
 week           store       lvl2
 week           format      lvl1
 week           format      lvl2

第二个数据框包含我希望在继续执行代码之前从dataframeA中排除的组合。一个空白单元格表示我想要排除所有类型,例如在下面的第一行中我想要使用store和lvl1的组合排除所有时间段。

DataframeB示例:

 time_period    store_type  category_type
                store       lvl1
 month          store       lvl2

我想以这样的方式应用排除:DataframeB首先删除store x lvl1的组合(即从DataframeA中删除第1行和第5行),然后删除month x store x lvl2的组合(即从DataframeA中删除第2行) )

结果数据框将以:

结尾
 time_period    store_type  category_type
 month          format      lvl1
 month          format      lvl2
 week           store       lvl2
 week           format      lvl1
 week           format      lvl2

我编写了一个解决方案,但不得不依赖于逐个排除组合,所以我希望有一个更优雅的解决方案:)

 all_exclusions <- NULL

 for (i in 1:nrow(dataframeB)) {

   # Find current row

   current_rows_data <-
     dataframeB %>%
     slice(i)

   # Number of combinations 

   num_vars <- (current_rows_data$time_period != "") + (current_rows_data$store_type != "") + (current_rows_data$category_type != "")

   # Exclude combinations

   exclusions <- 
     dataframeA %>%
     mutate(
       check = (time_period == current_rows_data$time_period) + 
               (store_type == current_rows_data$store_type) + 
               (category_type == current_rows_data$category_type)      
       ) %>%
     filter(check == num_vars)

   # Collate exclusions

   all_exclusions <- rbind_list(all_exclusions, exclusions)

   # Tidy up

   rm(exclusions)

 }

 # Remove exclusions

 dataframeA <- anti_join(dataframeA, all_exclusions)

1 个答案:

答案 0 :(得分:4)

您需要先在//Turns base feed into SimpleXML object. $feed = simplexml_load_string($xml); foreach($feed->property as $property) { //Gets latitude and longitude from each property. $latitude = $property->latitude; $longitude = $property->longitude; //Adds latitude and longitude values into Google Places API URL. $googleURL = 'https://maps.googleapis.com/maps/api/place/nearbysearch/xml?location='.$latitude.','.$longitude.'&radius=1000&types=train_station&key=my-google-key'; //Gets XML from Google Places and parses into SimpleXML Object. $googleXMLfile = file_get_contents($googleURL); $googleXMLdata = simplexml_load_string($googleXMLfile); //Array for limiting number of results. $googleStoredXMLDataArray = array(); foreach ($googleXMLdata->result as $result) { //Assigns result 'name' and 'type' to variables. $name = $result->name; $type = $result->type; //Creates object to store name and type together. $nameType = new StdClass(); $nameType->name = $name; $nameType->type = $type; //Pushes object to array and outputs results limiting feed to 3 results. array_push($googleStoredXMLDataArray, $nameType); $output = array_slice($googleStoredXMLDataArray, 0, 3); } //Adding proximityTo destination to property nodes in the feed parser. //Error - needs to pull properties from object inside the array. $property->addChild('proximityTo_name', '$googleStoredXMLDataArray->$nameType->name'); $property->addChild('proximityTo_type', '$googleStoredXMLDataArray->$nameType->type'); } print_r($feed); 上稍微处理一下,以便它包含您要从DataframeB中删除的所有行:

DataframeA

然后你可以使用for_all <- which(DataframeB$time_period=="") DB <- rbind(DataframeB, data.frame(time_period="week", DataframeB[for_all, 2:3], stringsAsFactors=F), data.frame(time_period="month", DataframeB[for_all, 2:3], stringsAsFactors=F) )[-for_all, ]

进行“反加入”
data.table

library(data.table) setDT(DataframeA)[!DataframeB, on=names(DataframeA)] # time_period store_type category_type #1: month format lvl1 #2: month format lvl2 #3: week store lvl2 #4: week format lvl1 #5: week format lvl2

dplyr