在按天分组之后查找意味着

时间:2018-04-07 23:53:40

标签: r

我有几天的观察数据集,需要找到每天观察的平均值。数据来自制表符分隔的文本文件,其中包含以下列名称:日期,日期,视图,参与度,销售额。我试图找到一周中所有7天的平均视图,参与和销售。在SAS中,我会使用proc制表,将Day作为类,将Views,Engagements和Sales作为变量,但我不确定如何将其转换为R代码。

Monday  21JUL03 7206    32  $6.73
Tuesday 22JUL03 9333    51  $4.99
Wednesday   23JUL03 8321    61  $8.87
Thursday    24JUL03 8378    35  $3.69
Friday  25JUL03 12202   45  $4.34
Saturday    26JUL03 6161    34  $3.12
Sunday  27JUL03 9115    29  $2.77
Monday  28JUL03 17112   51  $10.36
Tuesday 29JUL03 12690   51  $10.24
Wednesday   30JUL03 10822   30  $3.96
Thursday    31JUL03 10395   41  $5.45
Friday  01AUG03 6979    31  $2.95
Saturday    02AUG03 3810    19  $1.78
Sunday  03AUG03 4554    30  $5.71

2 个答案:

答案 0 :(得分:1)

OP想要为他的data.frame的3列计算mean。因此,dplyr::summarise_at应该是一个很好的选择。

解决方案分为两个步骤:

  1. tab分隔文件
  2. 中读取
  3. 使用dplyr
  4. 处理数据

    解决方案:

    # Read from file. "sales.txt" has been created using OP's data.
    df <- read.delim("sales.txt", header = FALSE, stringsAsFactors = FALSE)
    names(df) <- c("Day", "Date", "Views", "Engagement", "Sales")
    
    library(dplyr)
    
    df %>% mutate(Sales = as.numeric(sub("\\$","", Sales))) %>%
      group_by(Day) %>%
      summarise_at(vars(c("Views", "Engagement", "Sales")),funs(Mean = mean))
    
    
    # Result
    # # A tibble: 7 x 4
    #   Day       Views_Mean Engagement_Mean Sales_Mean
    #   <chr>          <dbl>           <dbl>      <dbl>
    # 1 Friday          9590            38.0       3.64
    # 2 Monday         12159            41.5       8.54
    # 3 Saturday        4986            26.5       2.45
    # 4 Sunday          6834            29.5       4.24
    # 5 Thursday        9386            38.0       4.57
    # 6 Tuesday        11012            51.0       7.62
    # 7 Wednesday       9572            45.5       6.41
    

答案 1 :(得分:0)

也许是这样的?

library(tidyverse)

Date <- seq(lubridate::ymd('2012-07-03'),lubridate::ymd('2012-07-20'),by='days')
Day <- lubridate::wday(Date, label = TRUE)
Views <- sample(c(4000:20000), length(Date))
Engagement <- sample(c(20:50), length(Date))
Sales <- sample.int(300:1000, length(Date))/100

df <- data.frame(Day, Date, Views, Engagement, Sales) %>%
    group_by(Day) %>%
    summarise(mean_engagement = mean(Engagement), 
              mean_views = mean(Views), 
              mean_sales = mean(Sales))

df