R中的行子集,如果列具有子集

时间:2017-07-31 14:51:55

标签: r

这个问题是我最近问过的问题的一个变体here。 (抱歉问两个类似的问题。我在问过我问的问题不是很正确之后才意识到,但我想我会留下原件,以防将来对人们有用,并将这个问题分开。)

我有一组看起来像这样的数据,稍微修改过上一个问题:

  Category     Item Shop1 Shop2 Shop3
1    Fruit   Apples     4     6     0
2    Fruit  Oranges     0     2     7
3      Veg Potatoes     0     1     0
4      Veg   Onions     0     2     8
5      Veg  Carrots     0     1     3
6    Dairy  Yoghurt     1     5     9
7    Dairy     Milk     0     1     0
8    Dairy   Cheese     0     0     7

我想过滤我的数据,以便我只有所有商店销售的类别 - 如果商店没有整个类别的销售,那么我'我想过滤掉它。在此示例中,Veg类别将被过滤掉,因为Shop1没有Veg销售。

为了解决这个问题,我尝试将我之前提出的问题的建议从使用FUN = any更改为FUN = all,但每次都没有工作并抛出错误,我不确定还有什么可以尝试的。

我很感激您提供的任何帮助。

4 个答案:

答案 0 :(得分:4)

以下是colSums

的一个想法
ind <- colSums(sapply(split(df[3:5], df$Category), function(i) colSums(i) == 0)) == 0
df[df$Category %in% names(ind)[ind],]

由此给出,

 Category    Item Shop1 Shop2 Shop3
1    Fruit  Apples     4     6     0
2    Fruit Oranges     0     2     7
6    Dairy Yoghurt     1     5     9
7    Dairy    Milk     0     1     0
8    Dairy  Cheese     0     0     7

答案 1 :(得分:3)

以下是使用dplyr的示例。您首先group_by类别变量,并且只保留销售超过0的记录。

library(tidyverse)
d <- data_frame(
    Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
    Item = c("Apples", "Oranges", "Potatoes", "Onions", "Carrots", "Yoghurt", "Milk", "Cheese"),
    Shop1 = c(4, rep(0, 4), 1, rep(0, 2)),
    Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0),
    Shop3 = c(0, 7, 0, 8,  3, 9, 0, 7)
)

d %>%
    group_by(Category) %>%
    filter(sum(Shop1)  > 0 & sum(Shop2) > 0 &  sum(Shop3) > 0) %>%
    ungroup()

答案 2 :(得分:1)

以下是使用library(data.table)

的解决方案
dt <- data.table(category=c("Fruit","Fruit","Veg","Veg","Veg","Dairy","Dairy","Dairy"),
                            item=c("apples","oranges","potatoes","onions","carrots","yoghurt","milk","cheese"),
                            shop1=c(4,0,0,0,0,1,0,0),
                            shop2=c(6,2,1,2,1,5,1,0),
                            shop3=c(0,7,0,8,3,9,0,7))
dt_m <- melt(dt,id.vars = c("category","item"))
dt_m[,counts:=sum(value),by=.(category,variable)]
dt_m <- dt_m[counts>0]
dt_m[,counts:=NULL]
dt <- dcast.data.table(dt_m,category+item~variable,value.var = "value")
dt <- na.omit(dt)

或使用dplyr

dt %>% melt(id.vars = c("category","item")) %>% group_by(category,variable) %>%
  mutate(counts=sum(value)) %>% filter(counts>0) %>% mutate(counts=NULL) %>% 
  dcast(category+item~variable,value.var = "value") %>% na.omit()

答案 3 :(得分:0)

使用data.table使用两个步骤的另一个解决方案。

# Data
dt <- data.table(Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
                 Item     = c("Apples", "Oranges", "Potatoes", "Onions", 
                              "Carrots", "Yoghurt", "Milk", "Cheese"),
                 Shop1    = c(4, rep(0, 4), 1, rep(0, 2)),
                 Shop2    = c(6, 2, 1, 2, 1, 5, 1, 0),
                 Shop3    = c(0, 7, 0, 8, 3, 9, 0, 7))

filt <- dt[, any(sum(Shop1) == 0, sum(Shop2) == 0, sum(Shop3) == 0), 
           by = Category]
filt
       Category    V1
1:    Fruit FALSE
2:      Veg  TRUE
3:    Dairy FALSE

dt[Category %in% filt[V1 == FALSE, Category]]

   Category    Item Shop1 Shop2 Shop3
1:    Fruit  Apples     4     6     0
2:    Fruit Oranges     0     2     7
3:    Dairy Yoghurt     1     5     9
4:    Dairy    Milk     0     1     0
5:    Dairy  Cheese     0     0     7