我有几天的观察数据集,需要找到每天观察的平均值。数据来自制表符分隔的文本文件,其中包含以下列名称:日期,日期,视图,参与度,销售额。我试图找到一周中所有7天的平均视图,参与和销售。在SAS中,我会使用proc制表,将Day作为类,将Views,Engagements和Sales作为变量,但我不确定如何将其转换为R代码。
Monday 21JUL03 7206 32 $6.73
Tuesday 22JUL03 9333 51 $4.99
Wednesday 23JUL03 8321 61 $8.87
Thursday 24JUL03 8378 35 $3.69
Friday 25JUL03 12202 45 $4.34
Saturday 26JUL03 6161 34 $3.12
Sunday 27JUL03 9115 29 $2.77
Monday 28JUL03 17112 51 $10.36
Tuesday 29JUL03 12690 51 $10.24
Wednesday 30JUL03 10822 30 $3.96
Thursday 31JUL03 10395 41 $5.45
Friday 01AUG03 6979 31 $2.95
Saturday 02AUG03 3810 19 $1.78
Sunday 03AUG03 4554 30 $5.71
答案 0 :(得分:1)
OP
想要为他的data.frame的3列计算mean
。因此,dplyr::summarise_at
应该是一个很好的选择。
解决方案分为两个步骤:
tab
分隔文件dplyr
解决方案:
# Read from file. "sales.txt" has been created using OP's data.
df <- read.delim("sales.txt", header = FALSE, stringsAsFactors = FALSE)
names(df) <- c("Day", "Date", "Views", "Engagement", "Sales")
library(dplyr)
df %>% mutate(Sales = as.numeric(sub("\\$","", Sales))) %>%
group_by(Day) %>%
summarise_at(vars(c("Views", "Engagement", "Sales")),funs(Mean = mean))
# Result
# # A tibble: 7 x 4
# Day Views_Mean Engagement_Mean Sales_Mean
# <chr> <dbl> <dbl> <dbl>
# 1 Friday 9590 38.0 3.64
# 2 Monday 12159 41.5 8.54
# 3 Saturday 4986 26.5 2.45
# 4 Sunday 6834 29.5 4.24
# 5 Thursday 9386 38.0 4.57
# 6 Tuesday 11012 51.0 7.62
# 7 Wednesday 9572 45.5 6.41
答案 1 :(得分:0)
也许是这样的?
library(tidyverse)
Date <- seq(lubridate::ymd('2012-07-03'),lubridate::ymd('2012-07-20'),by='days')
Day <- lubridate::wday(Date, label = TRUE)
Views <- sample(c(4000:20000), length(Date))
Engagement <- sample(c(20:50), length(Date))
Sales <- sample.int(300:1000, length(Date))/100
df <- data.frame(Day, Date, Views, Engagement, Sales) %>%
group_by(Day) %>%
summarise(mean_engagement = mean(Engagement),
mean_views = mean(Views),
mean_sales = mean(Sales))
df