我用下面的代码打了一下砖墙。从本质上讲,dftable应该是一个包含窗口小部件点击的过滤数据框(我遍历每个窗口小部件的列)。
然后,我希望得到小部件处于活动状态的所有综合浏览量的总和(它不在所有页面上,我按此过滤以排除NA为NA的那些)。但是,dfviews只返回所有网页浏览量,而不是过滤小部件不是NA的位置。
任何指导意见将不胜感激: mixpanelData示例:
--------------------------------------------------------------
| Group | Date | WidgetClick | Widget2Click | ViewedPageResult
--------------------------------------------------------------
| ABC | 01/01/2017 | 123456 | NA | 1450544
--------------------------------------------------------------
| ABN | 01/01/2017 | NA | 1245 | 4560000
--------------------------------------------------------------
| ABN | 01/02/2017 | NA | 1205 | 4561022
--------------------------------------------------------------
| BNN | 01/02/2017 | 1044 | NA | 4561021
--------------------------------------------------------------
我的理想输出将是......(比例,我可以处理这些比例很好)
WidgetClick CSV
--------------------------------------------------------------
Date | WidgetClick | ViewedPageResult
--------------------------------------------------------------
01/01/2017 | 123455 | 1450544
------------------------------------------------------------
01/02/2017 | 1044 | 4561021
--------------------------------------------------------------
WidgetClick 2 CSV
--------------------------------------------------------------
|Date | Widget2Click | ViewedPageResult
--------------------------------------------------------------
01/01/2017 | 1245 | 4560000
--------------------------------------------------------------
01/02/2017 | 1205 | 4561022
--------------------------------------------------------------
下面提供了代码......
vars = colnames(mixpanelData)
vars =vars[-c(1,2)]
k = 1
for (v in vars) {
filename <- paste(v,k,".csv",sep="")
dftable <- mixpanelData %>% filter(!is.na(v)) %>% group_by(Date) %>% summarise_(clicksum=interp(~sum(var, na.rm = TRUE), var = as.name(v)))
dfviews <- mixpanelData %>% filter(!is.na(v)) %>% group_by(Date) %>% summarise(viewsum=sum((ViewedPageResult)))
total <- merge(dftable,dfviews,by="Date")
total <- mutate(total, proportion = clicksum / viewsum * 100)
write.csv(total, file = filename,row.names=FALSE, na="")
k <- k +1 }
答案 0 :(得分:0)
在您想要的结果中,您会显示两个单独的表格。但是你也提到你有几个小部件,所以单独的表可能不太理想。我将展示如何获得单独的表格,然后我将展示如何一次性计算所有小部件。
单独的表格
使用dplyr
和tidyr
,您可以使用过滤器来获取您的两个表:
library(dplyr);library(tidyr)
df <- read.table(text="Group Date WidgetClick Widget2Click ViewedPageResult
ABC 01/01/2017 123456 NA 1450544
ABN 01/01/2017 NA 1245 4560000
ABN 01/02/2017 NA 1205 4561022
BNN 01/02/2017 1044 NA 4561021",header=TRUE,
stringsAsFactors=FALSE)
df%>% filter(!is.na(WidgetClick)) %>% select(-Widget2Click)
Group Date WidgetClick ViewedPageResult
1 ABC 01/01/2017 123456 1450544
2 BNN 01/02/2017 1044 4561021
df%>% filter(!is.na(Widget2Click)) %>% select(-WidgetClick)
Group Date Widget2Click ViewedPageResult
1 ABN 01/01/2017 1245 4560000
2 ABN 01/02/2017 1205 4561022
单人表
要在单个表中获得所有结果,首先需要gather
Widget * Click列,然后filter
:
df%>%
gather(Widget_number,Click,starts_with("Widget"))%>%
filter(!is.na(Click))
Group Date ViewedPageResult Widget_number Click
1 ABC 01/01/2017 1450544 WidgetClick 123456
2 BNN 01/02/2017 4561021 WidgetClick 1044
3 ABN 01/01/2017 4560000 Widget2Click 1245
4 ABN 01/02/2017 4561022 Widget2Click 1205
修改强>
要summarise
每个小部件每月的点击次数,您可以mutate
使用Year_mon
包中的as.yearmon
添加zoo
列。然后,group_by
Widget_number
和Year_month
,然后summarise
获取每月的总点击次数。您可以在summarise
语句中执行其他计算,例如比例。我假设日期是&#34;%m /%d /%Y&#34;。确保情况确实如此。
library(zoo)
df%>%
gather(Widget_number,Click,starts_with("Widget"))%>%
filter(!is.na(Click)) %>%
mutate(Year_month=as.yearmon(as.Date(Date,"%m/%d/%Y"))) %>%
group_by(Widget_number,Year_month) %>%
summarise(Sum_clicks=sum(Click,na.rm=TRUE))
Widget_number Year_month Sum_clicks
<chr> <S3: yearmon> <int>
1 Widget2Click Jan 2017 2450
2 WidgetClick Jan 2017 124500