根据连续观察设置列顺序

时间:2018-02-07 21:08:00

标签: r data.table

我的数据类似于:

  set.seed(1)
  dt <- data.table(stock = c(rep("a",24),rep("b",24),rep("c",24),rep("d",24)),
  hour = rep(1:24,4), day1 = sample(-5:5,96,replace = TRUE), 
  day2 = sample(-10:-1,96,replace = TRUE), day3 = sample(0:10,96,replace = TRUE),
  day4 = 0)

我每天都会创建一个总计的列,并按如下方式创建一个总计每天所有库存的行:

  dt[,Total_by_hour := rowSums(.SD), .SDcols = c("day1","day2","day3","day4")] 
  totals_row <- data.table(stock = "Total",hour = NA, t(colSums(dt[,!1:2])))
  dt <- rbind(dt,totals_row)

看起来像:

  stock hour    day1    day2    day3    day4    Total_by_hour
  a      1      -3       -6      1       0      -8
  a      2      -1       -6      10      0      3
  a      3       1       -2      3       0      2
  ...                   
  d      22      4       -5      1       0      0
  d      23      3       -3      3       0      3
  d      24      3       -7      1       0      -3
  Total         18       -507    426     0      -63

我想按&#34; Total_by_hour&#34;降序排序。柱。我还想根据最后一行&#34; Total&#34 ;,即设置按天排序的day1,day2,day3,day4列的列顺序。重新排序到第3天(共426个),第1天(共18个),第4天(共0个),第2天(共计-507个)。

我欢迎任何想法。非常感谢。

3 个答案:

答案 0 :(得分:3)

您可以使用setcolorder函数对data.table的行重新排序,并使用# Order by Total_by_hour descending setorder(dt, -Total_by_hour) 函数对列进行排序:

> head(dt)
   stock hour day1 day2 day3 day4 Total_by_hour
1:     a   21    5   -3    8    0            10
2:     c   20    3   -3   10    0            10
3:     d    4    4   -2    8    0            10
4:     a    8    2   -1    8    0             9
5:     a   15    3   -1    6    0             8
6:     d    5    4   -2    6    0             8

输出:

# Create a vector of the column names to reorder
cols_to_order <- paste0("day", 1:4)

# Get the order of the Total row for just these columns
reorder <- rev(order(dt[stock == "Total", cols_to_order, with = F]))

# Set the new column order
setcolorder(dt, neworder = c("stock", "hour", cols_to_order[reorder], "Total_by_hour"))

然后重新排序日期列:

> head(dt)
   stock hour day3 day1 day4 day2 Total_by_hour
1:     a   21    8    5    0   -3            10
2:     c   20   10    3    0   -3            10
3:     d    4    8    4    0   -2            10
4:     a    8    8    2    0   -1             9
5:     a   15    6    3    0   -1             8
6:     d    5    6    4    0   -2             8

输出:

pd.get_dummies

答案 1 :(得分:2)

使用data.table的另一种方式

library(data.table)
setorder( dt, Total_by_hour)
setcolorder( dt, c(grep("day", colnames(dt), value = TRUE, invert = TRUE),
                   colnames( sort(dt[ nrow(dt), .SD, .SDcols = grep("day", colnames(dt)) ], decreasing = TRUE))))

head(dt)
#    stock hour Total_by_hour day3 day1 day4 day2
# 1: Total   NA           -63  426   18    0 -507
# 2:     a   10           -11    2   -5    0   -8
# 3:     d   14           -11    1   -3    0   -9
# 4:     b   23            -9    4   -5    0   -8
# 5:     c   16            -9    1   -2    0   -8
# 6:     c   23            -9    3   -2    0  -10

答案 2 :(得分:1)

使用dplyr: 首先,按最后一栏安排。

library(dplyr)
dt_1 <- dt %>% arrange(Total_by_hour)

现在,计算总数并相应地对列进行排序

dt_cols <- dt %>% select(contains("day")) %>% summarise_all(sum)
rank(dt_cols[1,])
columns_ordered <- c("stock", "hour", 
                     c("day1","day2","day3","day4")[rank(dt_cols[1,])],
                     "Total_by_hour")
dt_2 <- dt_1[ , columns_ordered]

最后,再次添加“total”行:

totals_row <- data.table(stock = "Total",hour = NA, t(colSums(dt_2[,3:7])))
dt_2 <- rbind(dt_2,totals_row)