R组基于间隔对列进行值,并对每个间隔的结果取平均值

时间:2017-10-11 09:00:00

标签: r dataframe grouping

我有两张桌子

表1:

Dates_only <- data.frame(ID=c('1118','1118','1118','1118','1118',
                                 '1118','1118','1118','1119','1119',
                                 '1119','1119','1119','1119','1119',
                                 '1119','13PP','13PP','13PP','13PP',
                                 '13PP','13PP','13PP','13PP'),
                            Quart_y=c('2017Q3','2017Q4','2018Q1','2018Q2',
                                      '2018Q3','2018Q4','2019Q1','2019Q2',
                                      '2017Q3','2017Q4','2018Q1','2018Q2',
                                      '2018Q3','2018Q4','2019Q1','2019Q2',
                                      '2017Q3','2017Q4','2018Q1','2018Q2',
                                      '2018Q3','2018Q4','2019Q1','2019Q2'),
                            Quart=c(0.25,0.50,0.75,1.00,1.25,1.50,1.75,2.00,
                                    0.25,0.50,0.75,1.00,1.25,1.50,1.75,2.00,
                                    0.25,0.50,0.75,1.00,1.25,1.50,1.75,2.00))

和表2:

Values <- data.frame(ID=c('1118','1119','13PP','1118','1119','13PP',
                          '1118','1119','13PP','1118','1119','13PP',
                          '1118','1119','13PP','1118','1119','13PP',
                          '1118','1119','13PP','1118','1119','13PP',
                          '1118','1119','13PP','1118','1119','13PP'),
                     Day=c(0,0,0,0.14,0.13,0.13,0.2,0.23,0.24,0.27,0.28,
                           0.32,0.32,0.32,0.44,0.47,0.49,0.49,0.59,0.64,
                           0.61,0.72,0.71,0.73,0.95,0.86,0.78,1.1,0.93,1.15),
                     Value=c(7.6,6.2,6.8,7.1,6.2,5.9,6.8,5.8,4.6,6.5,5.4,
                             4.2,6.3,4.8,4,6,4.3,3.8,5.9,4,3.6,5.6,3.8,
                             3.4,5.4,3.2,3,5,2.9,2.9))

我要做的是找到一种方法来根据Values$Day更改Dates_only$Quart中的值。 具体而言,Dates_only$Quart代表量化的季度(2017Q3 - 0.25, 2017Q4-0.50,...,2018Q4-1.50)等。而Values$Day代表量化天数。 我想更改按季度分类的Values$Day,例如: 0<=Values$Day<=0.25 Values$Day==0.250.25<Values$Day<=0.50 Values$Day==0.50等。

我试图做的是使用此方法,但它会出现一条错误消息:

unique_quarters <- unique(Dates_only$Quart)
unique_quarters <- append(unique_quarters, 0, after=0)
df3 <- transform(Dates_only, 
                 Transf_Day=Values$Quart[findInterval(Values$Day, unique_quarters)])

我猜这个问题是findInterval(Values$Day, unique_quarters)返回的问题

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 5 4 5

虽然Values$Quart具有值

0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

1 个答案:

答案 0 :(得分:0)

试试这个:

library(tidyverse)
as.tbl(Values) %>% 
  mutate(Int=cut(Day, seq(0,3,0.25), include.lowest = T)) %>% 
  mutate(Int2=factor(Int, labels =  seq(0.25,1.25,0.25)))
# A tibble: 30 x 5
      ID   Day Value        Int   Int2
<fctr> <dbl> <dbl>     <fctr> <fctr>
1   1118  0.00   7.6   [0,0.25]   0.25
2   1119  0.00   6.2   [0,0.25]   0.25
3   13PP  0.00   6.8   [0,0.25]   0.25
4   1118  0.14   7.1   [0,0.25]   0.25
5   1119  0.13   6.2   [0,0.25]   0.25
6   13PP  0.13   5.9   [0,0.25]   0.25
7   1118  0.20   6.8   [0,0.25]   0.25
8   1119  0.23   5.8   [0,0.25]   0.25
9   13PP  0.24   4.6   [0,0.25]   0.25
10  1118  0.27   6.5 (0.25,0.5]    0.5
# ... with 20 more rows