基于R中相邻列求和的条件子集(plyr包?)

时间:2013-03-30 16:13:43

标签: r plyr subset inventory

我正在寻找一种更有效的方法在R中创建子集。使用数据集,其中rows = products和columns = time,我想找到项目在第1周开始销售的行(产品),然后把它作为一个子集。然后在第2周等同样做。

set.seed(4); d <- data.frame(
 product = seq(1:10),
 week1= sample(0:1,10,replace=TRUE), 
 week2= sample(0:3,10,replace=TRUE),
 week3=sample(0:5,10,replace=TRUE), 
 week4= sample(0:5,10,replace=TRUE),speed=sample(100:200,10),quality=sample(20:50,10)
)

完整的数据框是d。所以我需要知道两件事才能找到所有子集: 1)所有前几周的销售额都是== 0,然后 2)本周的销售额不是零。

任何子集都不应重叠,因为它们在首次进入市场时对产品进行分组。

我找到了一个穷人的方法,但我知道必须有一个更好的方法!

不充分的方式:

subset3<-d[d$week3 >0 & d$week2==0 & d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 ==0 & d$week2==0 & d$week1==0,]

稍微有效,但仍然不利

subset3<-d[d$week3 >0 & d$week2+d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 + d$week2 + d$week1==0,]

感觉我应该可以做这样的事情,但它不起作用:

subset4<-d[d$week4 >0 & sum(d$week1:d$week3) ==0, ]

我不认为ddply或apply会在这里工作,但也许我错了? 我需要的结果是d的子集,所有列,如下:

subset3 =

product week1 week2 week3 week4 speed quality
   2     0     0     5     1   124      42
   3     0     0     3     5   155      45

2 个答案:

答案 0 :(得分:0)

您可以使用以下内容:

d$weekstart <- apply(d[,-1],1,function(x) which(x>0)[1] )

这将确定每种产品的第一个非零销售周。 然后,您可以使用此列将数据集拆分为:

result <- split(d,d$weekstart)

然后,您可以访问每个子集,如:

result[[1]]

将上述代码中的1更改为您要访问的起始周,类似于subset1 subset2等。

答案 1 :(得分:0)

我希望我理解你要做的事情。这是使用rle函数的尝试。我将它应用于每一行。(每个产品)。

ll <- apply(d,1,function(x){
  y <- rle(x)
  nn <- names(y$lengths[y$values ==0])
  vv <- y$lengths[y$values ==0]
  if(length(nn)==0)
    res <- data.frame(nbr=0,goodweek='week1')
  else
   res <- data.frame(nbr=vv,goodweek=nn)
})


do.call(rbind,ll)
       nbr goodweek
week3    2    week3  ## 2 bad weeks with 0 then week3 is good 0 0 value>0
week31   2    week3
3        0    week1
week4    1    week4
week2    1    week2
6        0    week1 ## all weeks are good
week41   1    week4
8        1          ## the last week is bad! I dont' know what to return here!
9        0    week1
week21   1    week2

我在这里用你的d:

d
   week1 week2 week3 week4
1      0     0     5     2
2      0     0     1     3
3      1     2     3     2
4      1     1     0     1
5      0     3     1     4
6      1     1     2     4
7      1     2     0     4
8      1     3     2     0
9      1     1     5     4
10     0     3     2     2