计算R数据帧中列中特定值范围的小计

时间:2018-07-01 19:20:54

标签: r dataframe dplyr

我有一个数据帧<tr> <td nowrap="true" valign="top" width="190px" class="ms-formlabel"><h3 class="ms-standardheader"> <nobr>All employees in department</nobr> </h3></td> <td valign="top" class="ms-formbody"> <!-- FieldName="All employees" FieldInternalName="All_x0020_employees_x0020_in_x00" FieldType="SPFieldBoolean" --> <span dir="none"> <input id="ctl00_m_g_49618ec6_4999_44aa_87e7_6087a1cf4a6f_ctl00_ctl05_ctl00_ctl00_ctl00_ctl04_ctl00_ctl00_BooleanField" type="checkbox" name="ctl00$m$g_49618ec6_4999_44aa_87e7_6087a1cf4a6f$ctl00$ctl05$ctl00$ctl00$ctl00$ctl04$ctl00$ctl00$BooleanField" /><br /> </span> select employees ! </td> ,它具有以下结构:

df

我正在尝试为每个NEW_UPC IRI_KEY WEEK DOLLARS 13000016961 272568 1220 3.29 13000016961 272568 1221 3.29 13000016961 272568 1222 3.29 13000016961 272568 1223 9.87 13000016962 272568 1224 3.29 13000016961 272568 1224 9.87 13000016962 272568 1225 3.29 13000016961 272568 1225 9.87 13000016962 272568 1226 3.29 13000016961 272568 1226 9.87 13000016961 272568 1227 9.87 13000016961 272568 1228 3.29 13000016963 272568 1228 3.29 13000016963 272568 1229 3.29 13000016962 272568 1230 3.29 13000016961 272568 1230 3.29 13000016963 272568 1230 13.16 13000016962 272568 1231 3.29 13000016963 272568 1231 9.87 21600016430 272568 1231 17.43 13000016962 272568 1232 9.87 -DOLLARS组合获取前12周的NEW_UPC之和。我尝试了以下代码:

IRI_KEY

但是,我收到以下错误消息:

df %>% 
  group_by(NEW_UPC,IRI_KEY) %>% 
  mutate(START = min(WEEK), END = max(WEEK)) %>% ungroup() %>%
  group_by(NEW_UPC,IRI_KEY) %>%
  summarise(Sales = case_when(WEEK<=(START+12) ~ sum(DOLLARS)))

我在这里做什么错了?

已编辑:Error in summarise_impl(.data, dots) : Column `Sales` must be length 1 (a summary value), not 8 列中的值更改为实际总计,以避免在注释中引起混淆。

我想要获得的最终输出如下:

Sales

请注意,上面NEW_UPC IRI_KEY Sales 13000016961 272568 65.8 13000016962 272568 26.3 13000016963 272568 29.6 21600016430 272568 17.4 列中的值只是我用来显示输出结构的随机数。另外,如果SalesNEW_UPC起超过12周的时间内具有DOLLARS的值,那么我只想获取前12周的总数。因此,START列应返回到Sales前十二周的总数。或者,如果START的值NEW_UPCDOLLARS不到12周,则START应该返回该期间的总数。

1 个答案:

答案 0 :(得分:1)

您即将解决。您可以在WEEK上对数据进行排序,然后排名前列(head)12将为您提供前12周的数据。您可以尝试:

library(dplyr)
df %>% 
  group_by(NEW_UPC,IRI_KEY) %>%
  arrange(WEEK) %>%
  summarise(Sales = sum(head(DOLLARS,12)))

# # A tibble: 4 x 3
# # Groups: NEW_UPC [?]
#       NEW_UPC IRI_KEY Sales
#         <dbl>   <int> <dbl>
# 1 13000016961  272568  65.8
# 2 13000016962  272568  26.3
# 3 13000016963  272568  29.6
# 4 21600016430  272568  17.4

数据:

df <- read.table(text="
NEW_UPC         IRI_KEY     WEEK      DOLLARS
13000016961     272568      1220      3.29
13000016961     272568      1221      3.29
13000016961     272568      1222      3.29
13000016961     272568      1223      9.87
13000016962     272568      1224      3.29
13000016961     272568      1224      9.87
13000016962     272568      1225      3.29
13000016961     272568      1225      9.87
13000016962     272568      1226      3.29
13000016961     272568      1226      9.87
13000016961     272568      1227      9.87
13000016961     272568      1228      3.29
13000016963     272568      1228      3.29
13000016963     272568      1229      3.29
13000016962     272568      1230      3.29
13000016961     272568      1230      3.29
13000016963     272568      1230      13.16
13000016962     272568      1231      3.29
13000016963     272568      1231      9.87
21600016430     272568      1231      17.43
13000016962     272568      1232      9.87",
header = TRUE)