Question

我有一个数据框，其中包含一个整数类型的日期列。我还想将价格划分为10,000，然后计算当月的频率

df %>% 
  group_by(date) %>%
  count(values)

我尝试使用此代码，但无法在一个月内进行汇总

  group_by(month = month(date)) %>% 
  count(values)

从这里开始，我每天都有频率

tally(group_by(df, values,
               price = cut(price, breaks = seq(10000, 200000, by = 10000)))) %>%
    ungroup() %>% 
    spread(price, n, fill = 0)

当我尝试使用此代码汇总月份中的日期时，出现以下错误

（as.POSIXlt.character（as.character（x），...）中的错误：字符串不是标准的明确格式）

对于以10,000为单位的组（在价格列中），我正在使用以下代码

date  values 10k-20k 20k-30k 30k-40k 40k-50k 50k-60k 60k-70k 70k-80k 80k-90k
11/18  a       1
11/18  b                        1
12/18  a                1       1       1      1                        1
12/18  b                1       1              1         1     
12/18  c                        1       1                               1
...

问题：

我无法将其与代码结合起来以汇总月份中的日期，然后再按价格组传播数据。

预期输出：

suspend func

Answer 1

我们可以从日期列中提取月份-年份，使用data = {'A': {'pos': 289794, 'neg': 515063}, 'B': {'pos': 174790, 'neg': 292551}, 'C': {'pos': 375574, 'neg': 586616}, 'D': {'pos': 14932, 'neg': 8661}}将cut分成不同的存储段，使用price的频率，然后使用count转换为宽格式。

spread

数据

library(dplyr)
cut_group <- seq(10000,200000,by=10000)

df %>%
  mutate(date = as.Date(date, "%m/%d/%y"), 
         month_year = format(date, "%m-%y"), 
          groups = cut(price, cut_group, include.lowest = TRUE, 
     labels = paste(cut_group[-length(cut_group)], cut_group[-1], sep = "-"))) %>%
  count(values, month_year, groups) %>%
  tidyr::spread(groups, n, fill = 0)


#  values month_year `10000-20000` `20000-30000` `30000-40000` `40000-50000`
#   <fct>  <chr>            <dbl>         <dbl>         <dbl>         <dbl> 
# 1 a      01-19             0             0             0             1
# 2 a      02-19             1             0             0             0
# 3 a      05-19             0             0             0             0
# 4 a      11-18             1             0             0             0
#.....

Answer 2

如果有帮助，我可以提供一个data.table + lubridate解决方案：

library(data.table)
library(lubridate)

setDT(df)
df[,  .N, by = floor_date(date, "month")]

编辑：我错过了整个“ 10000人分组”部分：

df2 <- df[, .N, by = .(date = floor_date(date, "month"), range = cut(price, seq(0, 100e3, 10e3))]

然后您可以使用dcast使其具有宽格式：

dcast(df2, date~range)

用月份中的范围值和合计日期除以列值以计算该月内范围的频率

问题：

预期输出：

2 个答案: