唯一值的排序和排序数据框

时间:2018-10-25 21:39:26

标签: r data.table tidyverse

这是一个玩具数据:

df <- tibble::tribble( ~var2, ~var1, ~var3,   ~var4,
                      2L,   "A",   1.2,  "1/6/2018",
                      4L,   "A",  1.34,  "1/3/2018",
                      7L,   "B",  2.43,  "1/7/2018",
                      3L,   "C",     4,  "1/4/2018",
                      7L,   "A",   3.2,  "1/9/2018",
                      3L,   "D",   2.3, "1/10/2018",
                      4L,   "A",  0.34,  "1/9/2018",
                      5L,   "C",   4.2,  "1/7/2018",
                      5L,   "D",   6.5, "1/10/2018") %>% 
      mutate(var4 = mdy(var4))

我想为df中的每个变量创建一个唯一值的数据框,该变量的值从最大值(在顶部)到最小值(在底部)排序,反之亦然。同样,变量应从最小唯一值到最大唯一值(从左到右)排序。期望输出应为:

 df_of_unique_values <- tibble::tribble(~var1, ~var2,    ~var4,  ~var3,
                                        "D",    7L,  "1/3/2018",   6.5,
                                        "C",    5L,  "1/4/2018",   4.2,
                                        "B",    4L,  "1/6/2018",     4,
                                        "A",    3L,  "1/7/2018",   3.2,
                                         NA,    2L,  "1/9/2018",  2.43,
                                         NA,    NA, "1/10/2018",   2.3,
                                         NA,    NA,          NA,  1.34,
                                         NA,    NA,          NA,   1.2,
                                         NA,    NA,          NA,  0.34) %>% 
  mutate(var4 = mdy(var4))

我该如何最好使用tidyverse做到这一点?

2 个答案:

答案 0 :(得分:3)

我想一个人可以使用tidyverse,因为Rails.configuration.stripe = { Rails.application.credentials.stripe[:publishable_key], Rails.application.credentials.stripe[:secret_key] } Stripe.api_key = Rails.configuration.stripe[:secret_key] 看起来很简单:

order

这是tidyverse等效项。需要找到df[order(df$var1, df$var2, df$var3, -as.numeric(df$var4)),] # A tibble: 9 x 4 var2 var1 var3 var4 <int> <chr> <dbl> <date> 1 2 A 1.2 2018-01-06 2 4 A 0.34 2018-01-09 3 4 A 1.34 2018-01-03 4 7 A 3.2 2018-01-09 5 7 B 2.43 2018-01-07 6 3 C 4 2018-01-04 7 5 C 4.2 2018-01-07 8 3 D 2.3 2018-01-10 9 5 D 6.5 2018-01-10 帮助页面,该页面建议使用?arrange进行反向排序(等效于使用`order时使用desc()前缀):

-

列表将是返回长度不相等且彼此不相关的值的方法:

df %>% arrange(var1, var2, var3, desc(as.numeric(var4)))
# A tibble: 9 x 4 
   var2 var1   var3 var4      
  <int> <chr> <dbl> <date>    
1     2 A      1.2  2018-01-06
2     4 A      0.34 2018-01-09
3     4 A      1.34 2018-01-03
4     7 A      3.2  2018-01-09
5     7 B      2.43 2018-01-07
6     3 C      4    2018-01-04
7     5 C      4.2  2018-01-07
8     3 D      2.3  2018-01-10
9     5 D      6.5  2018-01-10

答案 1 :(得分:2)

Combining lists of different lengths into data frame借用:

close_button.onclick = closeModal = () => {
    modal.style.display = 'none';
    cropper.destroy();
}

Tidyverse等效项

str(lists <- lapply(df, function(a) sort(unique(a), decreasing=!inherits(a,"Date"))))
# List of 4
#  $ var2: int [1:5] 7 5 4 3 2
#  $ var1: chr [1:4] "D" "C" "B" "A"
#  $ var3: num [1:9] 6.5 4.2 4 3.2 2.43 2.3 1.34 1.2 0.34
#  $ var4: Date[1:6], format: "2018-01-03" "2018-01-04" "2018-01-06" "2018-01-07" ...
str(lists <- lists[order(lengths(lists))])
# List of 4
#  $ var1: chr [1:4] "D" "C" "B" "A"
#  $ var2: int [1:5] 7 5 4 3 2
#  $ var4: Date[1:6], format: "2018-01-03" "2018-01-04" "2018-01-06" "2018-01-07" ...
#  $ var3: num [1:9] 6.5 4.2 4 3.2 2.43 2.3 1.34 1.2 0.34
(maxlen <- max(lengths(lists)))
# [1] 9
str(lists <- lapply(lists, function(l) c(l, rep(NA, maxlen-length(l)))))
# List of 4
#  $ var1: chr [1:9] "D" "C" "B" "A" ...
#  $ var2: int [1:9] 7 5 4 3 2 NA NA NA NA
#  $ var4: Date[1:9], format: "2018-01-03" "2018-01-04" "2018-01-06" "2018-01-07" ...
#  $ var3: num [1:9] 6.5 4.2 4 3.2 2.43 2.3 1.34 1.2 0.34
as.data.frame(lists)
#   var1 var2       var4 var3
# 1    D    7 2018-01-03 6.50
# 2    C    5 2018-01-04 4.20
# 3    B    4 2018-01-06 4.00
# 4    A    3 2018-01-07 3.20
# 5 <NA>    2 2018-01-09 2.43
# 6 <NA>   NA 2018-01-10 2.30
# 7 <NA>   NA       <NA> 1.34
# 8 <NA>   NA       <NA> 1.20
# 9 <NA>   NA       <NA> 0.34

最重要的是:我同意@ 42-和@thelatemail的观点,这实际上不是最佳的存储格式。对library(dplyr) library(purrr) maxlen <- max(lengths(map(df, unique))) df %>% map(~ sort(unique(.), decreasing = !inherits(., "Date"))) %>% .[order(lengths(.))] %>% map(`length<-`, maxlen) %>% # alternative 1 # map(~ c(., rep(NA, maxlen - length(.)))) %>% # alternative 2 tbl_df() 的一种解释是,一行上的所有内容都是相关的。例如,在调查中,每一列都是一个问题,每一行都是一个受访者(调查者)。通过在列之间进行不同的重新排序,该关联将被完全丢弃。对于唯一的 not 而言,我想到的唯一理由是使用更简单的data.frame格式(以@ 42-结尾),用于报告演示,我认为您会做类似的事情

list