我是r的新手,在这个简单的任务中遇到了一些困难。 我有许多零售商的价格数据集。 我想计算在设定价格和更改价格的时间间隔内其他商店的价格变化。
(df编辑,根据Len Greski的回答)
~/.local/lib/python2.7/site-packages/
我试图以多种方式做到这一点,我没有找到任何方法来完成这项工作。对此有什么直接的解决方案吗?
我尝试过几种方法, 我认为第一个更接近得到我想要的东西
1)
id<-c(1,2,3,1,2,3)
startdate<-c("01/01/2017", "05/01/2017", "13/01/2017", "10/01/2017",
"01/02/2017" , "20/01/2017")
startdate<-as.POSIXct(strptime(startdate,"%d/%m/%Y"))
enddate<-c("10/01/2017","01/02/2017","20/01/2017","05/02/2017",
"06/02/2017","31/01/2017")
enddate<-as.POSIXct(strptime(enddate,"%d/%m/%Y"))
price<-runif(6,1,10)
item<-c("a","a","a","a","a","a")
result<-c(1,3,0,3,1,0)
2)
df<-mutate(df, counter=nrow(df[df$startdate > startime & df$endtime<endtime]))
谢谢所有人!
答案 0 :(得分:2)
以下是sqldf
包的方法。
id<-c(1,2,3,1,2,3)
startdate<-c("01/01/2017", "05/01/2017", "13/01/2017", "10/01/2017",
"01/02/2017" , "20/01/2017")
startdate<-as.POSIXct(strptime(startdate,"%d/%m/%Y"))
enddate<-c("10/01/2017","01/02/2017","20/01/2017","05/02/2017","06/02/2017","31/01/2017")
enddate<-as.POSIXct(strptime(enddate,"%d/%m/%Y"))
price<-runif(6,1,10)
item<-c("a","a","a","a","a","a")
result<-c(1,3,0,3,1,0)
df<-data.frame(item,id,startdate,enddate,price,result)
library(sqldf)
sqlStmt <- "select a.item, a.id,a.startdate, a.enddate, b.id as changedId, b.startdate as changedDate
from df as a
inner join df as b
on a.item = b.item and a.id != b.id and (b.startdate between a.startdate and a.enddate) "
priceChanges <- sqldf(sqlStmt)
priceChanges$changedDate <-
as.POSIXct(priceChanges$changedDate,origin="1970-01-01")
priceChanges
输出显示其他产品更改的产品ID和日期。
> priceChanges
item id startdate enddate changedId changedDate
1 a 1 2017-01-01 2017-01-10 2 2017-01-05
2 a 2 2017-01-05 2017-02-01 1 2017-01-10
3 a 2 2017-01-05 2017-02-01 3 2017-01-13
4 a 2 2017-01-05 2017-02-01 3 2017-01-20
5 a 1 2017-01-10 2017-02-05 2 2017-02-01
6 a 1 2017-01-10 2017-02-05 3 2017-01-13
7 a 1 2017-01-10 2017-02-05 3 2017-01-20
>
要按产品和startdate计算价格变化的数量,我们可以使用另一个SQL查询。
> sqlStmt <- "select item, id, startdate, count(*) as count from
priceChanges
+ group by item,id,startdate"
> priceChangeCounts <- sqldf(sqlStmt)
> priceChangeCounts
item id startdate count
1 a 1 2017-01-01 1
2 a 1 2017-01-10 3
3 a 2 2017-01-05 3
>
最后,我们将原始数据与汇总计数合并,并将缺失值重新编码为0,以便它们可用于后续分析。
sqlStmt <- "select a.*, b.count from df as a
left join priceChangeCounts as b
on a.item = b.item and a.id = b.id and a.startdate = b.startdate"
mergedData <- sqldf(sqlStmt)
mergedData[is.na(mergedData[,"count"]),"count"] <- 0
mergedData
...和输出。
> mergedData
item id startdate enddate price result count
1 a 1 2017-01-01 2017-01-10 6.484062 1 1
2 a 2 2017-01-05 2017-02-01 9.410354 3 3
3 a 3 2017-01-13 2017-01-20 5.656238 0 0
4 a 1 2017-01-10 2017-02-05 8.542557 3 3
5 a 2 2017-02-01 2017-02-06 1.769380 0 0
6 a 3 2017-01-20 2017-01-31 8.280155 0 0
>
请注意,result
列的OP中的数据有误,因为id=2
,startdate=2017-02-01
与新价格的开始日期的其他ID没有其他价格变化在2017-02-01和2017-02-06之间。
答案 1 :(得分:0)
所以,我找到了一个应该有效的简单解决方案:
id<-c(1,2,3,1,2,3)
startdate<-c("01/01/2017", "05/01/2017", "13/01/2017", "10/01/2017", "01/02/2017" , "20/01/2017")
startdate<-as.POSIXct(strptime(startdate,"%d/%m/%Y"))
enddate<-c("10/01/2017","01/02/2017","20/01/2017","05/02/2017","06/02/2017","31/01/2017")
enddate<-as.POSIXct(strptime(enddate,"%d/%m/%Y"))
p<-runif(6,1,10)
item<-c("a","a","a","a","a","a")
result<-c(1,3,0,3,0,0)
df<-data.frame(item,id,startdate,enddate,p,result)
for (i in 1:nrow(df)) {
a<-df$startdate[i]
b<-df$enddate[i]
df$counter[i]<-nrow(subset(df,(df$startdate< b&
df$startdate> a)))
}