这个问题是我最近问过的问题的一个变体here。 (抱歉问两个类似的问题。我在问过我问的问题不是很正确之后才意识到,但我想我会留下原件,以防将来对人们有用,并将这个问题分开。)
我有一组看起来像这样的数据,稍微修改过上一个问题:
Category Item Shop1 Shop2 Shop3
1 Fruit Apples 4 6 0
2 Fruit Oranges 0 2 7
3 Veg Potatoes 0 1 0
4 Veg Onions 0 2 8
5 Veg Carrots 0 1 3
6 Dairy Yoghurt 1 5 9
7 Dairy Milk 0 1 0
8 Dairy Cheese 0 0 7
我想过滤我的数据,以便我只有所有商店销售的类别 - 如果商店没有整个类别的销售,那么我'我想过滤掉它。在此示例中,Veg类别将被过滤掉,因为Shop1没有Veg销售。
为了解决这个问题,我尝试将我之前提出的问题的建议从使用FUN = any
更改为FUN = all
,但每次都没有工作并抛出错误,我不确定还有什么可以尝试的。
我很感激您提供的任何帮助。
答案 0 :(得分:4)
以下是colSums
,
ind <- colSums(sapply(split(df[3:5], df$Category), function(i) colSums(i) == 0)) == 0
df[df$Category %in% names(ind)[ind],]
由此给出,
Category Item Shop1 Shop2 Shop3 1 Fruit Apples 4 6 0 2 Fruit Oranges 0 2 7 6 Dairy Yoghurt 1 5 9 7 Dairy Milk 0 1 0 8 Dairy Cheese 0 0 7
答案 1 :(得分:3)
以下是使用dplyr
的示例。您首先group_by
类别变量,并且只保留销售超过0的记录。
library(tidyverse)
d <- data_frame(
Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
Item = c("Apples", "Oranges", "Potatoes", "Onions", "Carrots", "Yoghurt", "Milk", "Cheese"),
Shop1 = c(4, rep(0, 4), 1, rep(0, 2)),
Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0),
Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7)
)
d %>%
group_by(Category) %>%
filter(sum(Shop1) > 0 & sum(Shop2) > 0 & sum(Shop3) > 0) %>%
ungroup()
答案 2 :(得分:1)
以下是使用library(data.table)
dt <- data.table(category=c("Fruit","Fruit","Veg","Veg","Veg","Dairy","Dairy","Dairy"),
item=c("apples","oranges","potatoes","onions","carrots","yoghurt","milk","cheese"),
shop1=c(4,0,0,0,0,1,0,0),
shop2=c(6,2,1,2,1,5,1,0),
shop3=c(0,7,0,8,3,9,0,7))
dt_m <- melt(dt,id.vars = c("category","item"))
dt_m[,counts:=sum(value),by=.(category,variable)]
dt_m <- dt_m[counts>0]
dt_m[,counts:=NULL]
dt <- dcast.data.table(dt_m,category+item~variable,value.var = "value")
dt <- na.omit(dt)
或使用dplyr
dt %>% melt(id.vars = c("category","item")) %>% group_by(category,variable) %>%
mutate(counts=sum(value)) %>% filter(counts>0) %>% mutate(counts=NULL) %>%
dcast(category+item~variable,value.var = "value") %>% na.omit()
答案 3 :(得分:0)
使用data.table
使用两个步骤的另一个解决方案。
# Data
dt <- data.table(Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
Item = c("Apples", "Oranges", "Potatoes", "Onions",
"Carrots", "Yoghurt", "Milk", "Cheese"),
Shop1 = c(4, rep(0, 4), 1, rep(0, 2)),
Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0),
Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7))
filt <- dt[, any(sum(Shop1) == 0, sum(Shop2) == 0, sum(Shop3) == 0),
by = Category]
filt
Category V1
1: Fruit FALSE
2: Veg TRUE
3: Dairy FALSE
dt[Category %in% filt[V1 == FALSE, Category]]
Category Item Shop1 Shop2 Shop3
1: Fruit Apples 4 6 0
2: Fruit Oranges 0 2 7
3: Dairy Yoghurt 1 5 9
4: Dairy Milk 0 1 0
5: Dairy Cheese 0 0 7