制作一个条件计数的列

时间:2017-12-29 06:48:06

标签: r

我是r的新手,在这个简单的任务中遇到了一些困难。 我有许多零售商的价格数据集。 我想计算在设定价格和更改价格的时间间隔内其他商店的价格变化。

(df编辑,根据Len Greski的回答)

~/.local/lib/python2.7/site-packages/

我试图以多种方式做到这一点,我没有找到任何方法来完成这项工作。对此有什么直接的解决方案吗?

我尝试过几种方法, 我认为第一个更接近得到我想要的东西

1)

id<-c(1,2,3,1,2,3) startdate<-c("01/01/2017", "05/01/2017", "13/01/2017", "10/01/2017", "01/02/2017" , "20/01/2017") startdate<-as.POSIXct(strptime(startdate,"%d/%m/%Y")) enddate<-c("10/01/2017","01/02/2017","20/01/2017","05/02/2017", "06/02/2017","31/01/2017") enddate<-as.POSIXct(strptime(enddate,"%d/%m/%Y")) price<-runif(6,1,10) item<-c("a","a","a","a","a","a") result<-c(1,3,0,3,1,0)

2)

df<-mutate(df, counter=nrow(df[df$startdate > startime & df$endtime<endtime]))

谢谢所有人!

2 个答案:

答案 0 :(得分:2)

以下是sqldf包的方法。

id<-c(1,2,3,1,2,3)
startdate<-c("01/01/2017", "05/01/2017", "13/01/2017", "10/01/2017", 
             "01/02/2017" , "20/01/2017")
startdate<-as.POSIXct(strptime(startdate,"%d/%m/%Y"))
enddate<-c("10/01/2017","01/02/2017","20/01/2017","05/02/2017","06/02/2017","31/01/2017")
enddate<-as.POSIXct(strptime(enddate,"%d/%m/%Y"))
price<-runif(6,1,10)
item<-c("a","a","a","a","a","a")
result<-c(1,3,0,3,1,0)

df<-data.frame(item,id,startdate,enddate,price,result)
library(sqldf)
sqlStmt <- "select a.item, a.id,a.startdate, a.enddate, b.id as changedId, b.startdate as changedDate 
            from df as a 
            inner join  df as b
            on a.item = b.item and a.id != b.id and (b.startdate between a.startdate and a.enddate) "


priceChanges <- sqldf(sqlStmt)
priceChanges$changedDate <- 
as.POSIXct(priceChanges$changedDate,origin="1970-01-01")
priceChanges

输出显示其他产品更改的产品ID和日期。

> priceChanges
  item id  startdate    enddate changedId changedDate
1    a  1 2017-01-01 2017-01-10         2  2017-01-05
2    a  2 2017-01-05 2017-02-01         1  2017-01-10
3    a  2 2017-01-05 2017-02-01         3  2017-01-13
4    a  2 2017-01-05 2017-02-01         3  2017-01-20
5    a  1 2017-01-10 2017-02-05         2  2017-02-01
6    a  1 2017-01-10 2017-02-05         3  2017-01-13
7    a  1 2017-01-10 2017-02-05         3  2017-01-20
> 

要按产品和startdate计算价格变化的数量,我们可以使用另一个SQL查询。

> sqlStmt <- "select item, id, startdate, count(*) as count from 
priceChanges
+                  group by item,id,startdate"
> priceChangeCounts <- sqldf(sqlStmt)
> priceChangeCounts
   item id  startdate count
 1    a  1 2017-01-01     1
 2    a  1 2017-01-10     3
 3    a  2 2017-01-05     3
 > 

最后,我们将原始数据与汇总计数合并,并将缺失值重新编码为0,以便它们可用于后续分析。

sqlStmt <- "select a.*, b.count from df as a
             left join priceChangeCounts as b
             on a.item = b.item and a.id = b.id and a.startdate = b.startdate"
mergedData <- sqldf(sqlStmt)
mergedData[is.na(mergedData[,"count"]),"count"] <- 0
mergedData

...和输出。

> mergedData
  item id  startdate    enddate    price result count
1    a  1 2017-01-01 2017-01-10 6.484062      1     1
2    a  2 2017-01-05 2017-02-01 9.410354      3     3
3    a  3 2017-01-13 2017-01-20 5.656238      0     0
4    a  1 2017-01-10 2017-02-05 8.542557      3     3
5    a  2 2017-02-01 2017-02-06 1.769380      0     0
6    a  3 2017-01-20 2017-01-31 8.280155      0     0
>

请注意,result列的OP中的数据有误,因为id=2startdate=2017-02-01与新价格的开始日期的其他ID没有其他价格变化在2017-02-01和2017-02-06之间。

答案 1 :(得分:0)

所以,我找到了一个应该有效的简单解决方案:

id<-c(1,2,3,1,2,3)
startdate<-c("01/01/2017", "05/01/2017", "13/01/2017", "10/01/2017", "01/02/2017" , "20/01/2017")
startdate<-as.POSIXct(strptime(startdate,"%d/%m/%Y"))
enddate<-c("10/01/2017","01/02/2017","20/01/2017","05/02/2017","06/02/2017","31/01/2017")
enddate<-as.POSIXct(strptime(enddate,"%d/%m/%Y"))
p<-runif(6,1,10)
item<-c("a","a","a","a","a","a")
result<-c(1,3,0,3,0,0)
df<-data.frame(item,id,startdate,enddate,p,result)


for (i in 1:nrow(df)) {
  a<-df$startdate[i]
  b<-df$enddate[i]
  df$counter[i]<-nrow(subset(df,(df$startdate< b&
                                 df$startdate> a)))

}