我正在寻找一种更有效的方法在R中创建子集。使用数据集,其中rows = products和columns = time,我想找到项目在第1周开始销售的行(产品),然后把它作为一个子集。然后在第2周等同样做。
set.seed(4); d <- data.frame(
product = seq(1:10),
week1= sample(0:1,10,replace=TRUE),
week2= sample(0:3,10,replace=TRUE),
week3=sample(0:5,10,replace=TRUE),
week4= sample(0:5,10,replace=TRUE),speed=sample(100:200,10),quality=sample(20:50,10)
)
完整的数据框是d。所以我需要知道两件事才能找到所有子集: 1)所有前几周的销售额都是== 0,然后 2)本周的销售额不是零。
任何子集都不应重叠,因为它们在首次进入市场时对产品进行分组。
我找到了一个穷人的方法,但我知道必须有一个更好的方法!
不充分的方式:
subset3<-d[d$week3 >0 & d$week2==0 & d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 ==0 & d$week2==0 & d$week1==0,]
稍微有效,但仍然不利
subset3<-d[d$week3 >0 & d$week2+d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 + d$week2 + d$week1==0,]
感觉我应该可以做这样的事情,但它不起作用:
subset4<-d[d$week4 >0 & sum(d$week1:d$week3) ==0, ]
我不认为ddply或apply会在这里工作,但也许我错了? 我需要的结果是d的子集,所有列,如下:
subset3 =
product week1 week2 week3 week4 speed quality
2 0 0 5 1 124 42
3 0 0 3 5 155 45
答案 0 :(得分:0)
您可以使用以下内容:
d$weekstart <- apply(d[,-1],1,function(x) which(x>0)[1] )
这将确定每种产品的第一个非零销售周。 然后,您可以使用此列将数据集拆分为:
result <- split(d,d$weekstart)
然后,您可以访问每个子集,如:
result[[1]]
将上述代码中的1
更改为您要访问的起始周,类似于subset1
subset2
等。
答案 1 :(得分:0)
我希望我理解你要做的事情。这是使用rle
函数的尝试。我将它应用于每一行。(每个产品)。
ll <- apply(d,1,function(x){
y <- rle(x)
nn <- names(y$lengths[y$values ==0])
vv <- y$lengths[y$values ==0]
if(length(nn)==0)
res <- data.frame(nbr=0,goodweek='week1')
else
res <- data.frame(nbr=vv,goodweek=nn)
})
do.call(rbind,ll)
nbr goodweek
week3 2 week3 ## 2 bad weeks with 0 then week3 is good 0 0 value>0
week31 2 week3
3 0 week1
week4 1 week4
week2 1 week2
6 0 week1 ## all weeks are good
week41 1 week4
8 1 ## the last week is bad! I dont' know what to return here!
9 0 week1
week21 1 week2
我在这里用你的d:
d
week1 week2 week3 week4
1 0 0 5 2
2 0 0 1 3
3 1 2 3 2
4 1 1 0 1
5 0 3 1 4
6 1 1 2 4
7 1 2 0 4
8 1 3 2 0
9 1 1 5 4
10 0 3 2 2