我有这个数据框:
Date Visitor-ID
1 2018-01-01 1
2 2018-01-01 2
3 2018-01-01 3
4 2018-01-02 2
5 2018-01-02 3
6 2018-01-02 2
7 2018-01-03 2
8 2018-01-03 3
数据框由以下代码生成:
myDF=data.frame(c("2018-01-01","2018-01-01","2018-01-01","2018-01-02","2018-01-02","2018-01-02","2018-01-03","2018-01-03"),c(1,2,3,2,3,2,2,3))
names(myDF)=c("Date","Visitor-ID")
我想将原始数据框更改为此新数据框:
Date day 0 day 1 day 2
1 2018-01-01 3 2 2
2 2018-01-02 2 2
3 2018-01-03 2
在新数据框中,每个单元格是第x天唯一访问者的计数,他们已经在该行的给定日期到过那里。
问题: 我可以使用哪些代码行?
答案 0 :(得分:1)
这是你需要的吗?
library(tidyr)
library(dplyr)
df=myDF%>%group_by(Date)%>%summarise(s=list(`Visitor-ID`))# convert to list to find the intersection after merge
df['key']=1# create a help key for merge , this will help to get the product combination
s=merge(df,df,by='key')
s['New']=apply(s,1,function(x) length(intersect(x$s.x, x$s.y)))# find the intersection of each
s['day']=as.Date(s$Date.y)-as.Date(s$Date.x)# get the date different
s=s[s$day>=0,]# filter only for the next day , which means we only look forward not backward
s[,c('Date.x','New','day')]%>%tidyr::spread(day,New)# reshape three column to matrix you need
Date.x 0 1 2
1 2018-01-01 3 2 2
2 2018-01-02 2 2 NA
3 2018-01-03 2 NA NA
答案 1 :(得分:0)
代码有些粗糙,但这应该对你有用,
myDF=data.frame(c("2018-01-01","2018-01-01","2018-01-01","2018-01-02","2018-01-02","2018-01-02","2018-01-03","2018-01-03"),c(1,2,3,2,3,2,2,3))
names(myDF)=c("Date","Visitor-ID")
myDF$Date <- as.Date(myDF$Date)
num.days <- as.numeric(max(myDF$Date) - min(myDF$Date))
new.cols.names <- paste("day", 0:num.days)
unique.dates <- unique(myDF$Date)
final.df <- matrix(0, ncol = length(new.cols.names)+1, nrow = length(unique.dates))
for (i in 1:length(unique.dates)){
ids <- unique(myDF[myDF$Date == unique.dates[i], ]$`Visitor-ID`)
for (j in 0:(as.numeric(max(myDF$Date) - unique.dates[i]))){
final.df[i, j+2] <- sum(ids %in% myDF[myDF$Date == unique.dates[i] + j, ]$`Visitor-ID`)
}
}
final.df <- data.frame(final.df)
names(final.df) <- c("Date", new.cols.names)
final.df$Date <- unique.dates
这可行,但对于大型数据集可能会很慢。您可以使用某种形式的sapply
来提高效率。我希望这有帮助!