复制具有不同值的副本的行子集?

时间:2018-06-09 17:36:30

标签: r dplyr data-cleaning data-munging

数据样本:

   Date    Value GeographyName                               Newdate
   <chr>   <dbl> <chr>                                         <int>
 1 2011/12 0.698 NHS Wigan Borough CCG                          2012
 2 2011/12 0.674 NHS Gateshead CCG                              2012
 3 2012/13 0.775 NHS North Hampshire CCG                        2013
 4 2012/13 0.686 NHS St Helens CCG                              2013
 5 2012/13 0.716 NHS Wakefield CCG                              2013
 6 2012/13 0.750 NHS West Lancashire CCG                        2013
 7 2012/13 0.722 NHS Hull CCG                                   2013
 8 2013/14 0.746 NHS Brent CCG                                  2014
 9 2013/14 0.776 NHS Hambleton, Richmondshire and Whitby CCG    2014
10 2013/14 0.686 NHS Barnsley CCG                               2014

我希望将2012向量中的年份Newdate复制三次,共计六个新的重复行。但是,我希望其中两个新行的Newdate值为2011,另外两行的值为2010,最后两个新行的值为2009SELECT * FROM (SELECT salesman_id, CASE WHEN sales_region IN ('Oranage', 'Purple') THEN 'Special' ELSE sales_region END AS sales_region, supervisor, ROW_NUMBER() OVER (PARTITION BY CASE WHEN sales_region IN ('Oranage', 'Purple') THEN 'Special' ELSE sales_region END ORDER BY dbms_random.value) AS num_row FROM sales_table) t WHERE (sales_region = 'Special' AND num_row <= 18) OR (num_row <= 3) 。有没有办法在复制过程中实现这一目标?

1 个答案:

答案 0 :(得分:1)

dplyr::bind_rows提供了绑定多个数据帧行的灵活性。首先可以过滤df以包含Newdate == 2012的行,然后使用bind_rows将其合并多次。通过OP修改每个描述的Newdate,然后将其与原始df合并。

library(dplyr)

df %>% filter(Newdate == 2012) %>%
  bind_rows(., ., .) %>%  #Duplicating rows 3 times
  mutate(Newdate = Newdate - (row_number()+1) %/% 2) %>%
  bind_rows(df, .)

#       Date Value                               GeographyName Newdate
# 1  2011/12 0.698                       NHS Wigan Borough CCG    2012
# 2  2011/12 0.674                           NHS Gateshead CCG    2012
# 3  2012/13 0.775                     NHS North Hampshire CCG    2013
# 4  2012/13 0.686                           NHS St Helens CCG    2013
# 5  2012/13 0.716                           NHS Wakefield CCG    2013
# 6  2012/13 0.750                     NHS West Lancashire CCG    2013
# 7  2012/13 0.722                                NHS Hull CCG    2013
# 8  2013/14 0.746                               NHS Brent CCG    2014
# 9  2013/14 0.776 NHS Hambleton, Richmondshire and Whitby CCG    2014
# 10 2013/14 0.686                            NHS Barnsley CCG    2014
# 11 2011/12 0.698                       NHS Wigan Borough CCG    2011
# 12 2011/12 0.674                           NHS Gateshead CCG    2011
# 13 2011/12 0.698                       NHS Wigan Borough CCG    2010
# 14 2011/12 0.674                           NHS Gateshead CCG    2010
# 15 2011/12 0.698                       NHS Wigan Borough CCG    2009
# 16 2011/12 0.674                           NHS Gateshead CCG    2009

数据:

df <- read.table(text = 
"Date    Value GeographyName                               Newdate
1 2011/12 0.698 'NHS Wigan Borough CCG'                          2012
2 2011/12 0.674 'NHS Gateshead CCG'                              2012
3 2012/13 0.775 'NHS North Hampshire CCG'                        2013
4 2012/13 0.686 'NHS St Helens CCG'                              2013
5 2012/13 0.716 'NHS Wakefield CCG'                              2013
6 2012/13 0.750 'NHS West Lancashire CCG'                        2013
7 2012/13 0.722 'NHS Hull CCG'                                   2013
8 2013/14 0.746 'NHS Brent CCG'                                  2014
9 2013/14 0.776 'NHS Hambleton, Richmondshire and Whitby CCG'    2014
10 2013/14 0.686 'NHS Barnsley CCG'                               2014",
stringsAsFactors = FALSE, header = TRUE)