R:重新排序多个变量组

时间:2017-01-25 02:44:24

标签: r dplyr

这是与reordering groups with dataframe类似的问题,但不同之处在于有两个以上的变量。示例数据:

raw <- "Date          Response     ZNumber     Latency    ZPV
        2016-05-04    1            1           445.562    59.666
        2016-05-04    2            1           433.890    97.285
        2016-05-04    3            1           372.073    53.994
        2016-05-04    4            1           282.337    89.686
        2016-05-04    4            2           333.186    57.471
        2016-05-04    5            1           320.500    71.968
        2016-05-04    5            2           280.818    49.187
        2016-07-14    1            1           411.849    65.539
        2016-07-14    2            1           346.814    50.626"
data <- read.table(text=raw, header = TRUE)

个人'日期响应-ZNumber'和'Latency-ZPV'始终正确关联。每个日期响应的ZNumber顺序应按延迟的升序定义。

我的数据中的问题是,有时当日期响应具有多个ZNumber时,延迟订单有时与ZNumber订单不匹配,例如日期= 2016-05-04,响应= 4在ZNumber和Latency中都有升序,而Date = 2016-05-04,Response = 5 ZNumber在递增下降时递增。

我无法发现正确的拆分 - 应用 - 合并操作。

输出

我想要执行的是在“日期响应”组中一起提升的ZNumber和Latency,例如日期= 2016-05-04,响应= 5

"Date          Response     ZNumber     Latency    ZPV
2016-05-04    1            1           445.562    59.666
2016-05-04    2            1           433.890    97.285
2016-05-04    3            1           372.073    53.994
2016-05-04    4            1           282.337    89.686
2016-05-04    4            2           333.186    57.471
2016-05-04    5            1           280.818    49.187
2016-05-04    5            2           320.500    71.968
2016-07-14    1            1           411.849    65.539
2016-07-14    2            1           346.814    50.626"

dplyr

许多尝试解决,如下所示,都没有奏效......

library(dplyr)

data <- data %>%
group_by(Date, Response) %>%
arrange(Latency, ZNumber) %>% 
arrange(Date, Response)

或者,如上述相关问题所示...

data <- data %>%
arrange(df, group, desc(value))

与各种'变异连接'没有成功。例如

data <- data %>%
  group_by(Date,Response) %>%
  select(Latency) %>%
  arrange(Latency) %>% 
  arrange(Response) %>%
  full_join(data,by=c("Date","Response"))

但是现在有两个Latency列。

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
other attached packages:
[1] dplyr_0.5.0
loaded via a namespace (and not attached):
[1] lazyeval_0.2.0 magrittr_1.5   R6_2.2.0       assertthat_0.1 DBI_0.5-1     
[6] tools_3.3.2    tibble_1.2     Rcpp_0.12.8 

1 个答案:

答案 0 :(得分:0)

我有data.table的解决方案,使用最少的脚本编写工作非常简单

raw <- "Date          Response     ZNumber     Latency    ZPV
        2016-05-04    1            1           445.562    59.666
2016-05-04    2            1           433.890    97.285
2016-05-04    3            1           372.073    53.994
2016-05-04    4            1           282.337    89.686
2016-05-04    4            2           333.186    57.471
2016-05-04    5            1           320.500    71.968
2016-05-04    5            2           280.818    49.187
2016-07-14    1            1           411.849    65.539
2016-07-14    2            1           346.814    50.626"
data <- read.table(text=raw, header = TRUE)
library(data.table)
data <- data.table(data)
data <- data[order(as.numeric(Latency))]
data[,new_ZNumber:=1:length(Latency),by=.(Date,Response)]
data <- data[order(Date,as.numeric(Response),as.numeric(Latency))]
data

输出:

         Date Response ZNumber Latency    ZPV new_ZNumber
1: 2016-05-04        1       1 445.562 59.666           1
2: 2016-05-04        2       1 433.890 97.285           1
3: 2016-05-04        3       1 372.073 53.994           1
4: 2016-05-04        4       1 282.337 89.686           1
5: 2016-05-04        4       2 333.186 57.471           2
6: 2016-05-04        5       2 280.818 49.187           1
7: 2016-05-04        5       1 320.500 71.968           2
8: 2016-07-14        1       1 411.849 65.539           1
9: 2016-07-14        2       1 346.814 50.626           1

不确定为什么ddply没有做你想做的事情,但请告诉我这是你的想法。

编辑:根据OP的请求添加了名为ZNumber的重建new_ZNumber