这是一个玩具数据:
df <- tibble::tribble( ~var2, ~var1, ~var3, ~var4,
2L, "A", 1.2, "1/6/2018",
4L, "A", 1.34, "1/3/2018",
7L, "B", 2.43, "1/7/2018",
3L, "C", 4, "1/4/2018",
7L, "A", 3.2, "1/9/2018",
3L, "D", 2.3, "1/10/2018",
4L, "A", 0.34, "1/9/2018",
5L, "C", 4.2, "1/7/2018",
5L, "D", 6.5, "1/10/2018") %>%
mutate(var4 = mdy(var4))
我想为df中的每个变量创建一个唯一值的数据框,该变量的值从最大值(在顶部)到最小值(在底部)排序,反之亦然。同样,变量应从最小唯一值到最大唯一值(从左到右)排序。期望输出应为:
df_of_unique_values <- tibble::tribble(~var1, ~var2, ~var4, ~var3,
"D", 7L, "1/3/2018", 6.5,
"C", 5L, "1/4/2018", 4.2,
"B", 4L, "1/6/2018", 4,
"A", 3L, "1/7/2018", 3.2,
NA, 2L, "1/9/2018", 2.43,
NA, NA, "1/10/2018", 2.3,
NA, NA, NA, 1.34,
NA, NA, NA, 1.2,
NA, NA, NA, 0.34) %>%
mutate(var4 = mdy(var4))
我该如何最好使用tidyverse做到这一点?
答案 0 :(得分:3)
我想一个人可以使用tidyverse,因为Rails.configuration.stripe = {
Rails.application.credentials.stripe[:publishable_key],
Rails.application.credentials.stripe[:secret_key]
}
Stripe.api_key = Rails.configuration.stripe[:secret_key]
看起来很简单:
order
这是tidyverse等效项。需要找到df[order(df$var1, df$var2, df$var3, -as.numeric(df$var4)),]
# A tibble: 9 x 4
var2 var1 var3 var4
<int> <chr> <dbl> <date>
1 2 A 1.2 2018-01-06
2 4 A 0.34 2018-01-09
3 4 A 1.34 2018-01-03
4 7 A 3.2 2018-01-09
5 7 B 2.43 2018-01-07
6 3 C 4 2018-01-04
7 5 C 4.2 2018-01-07
8 3 D 2.3 2018-01-10
9 5 D 6.5 2018-01-10
帮助页面,该页面建议使用?arrange
进行反向排序(等效于使用`order时使用desc()
前缀):
-
列表将是返回长度不相等且彼此不相关的值的方法:
df %>% arrange(var1, var2, var3, desc(as.numeric(var4)))
# A tibble: 9 x 4
var2 var1 var3 var4
<int> <chr> <dbl> <date>
1 2 A 1.2 2018-01-06
2 4 A 0.34 2018-01-09
3 4 A 1.34 2018-01-03
4 7 A 3.2 2018-01-09
5 7 B 2.43 2018-01-07
6 3 C 4 2018-01-04
7 5 C 4.2 2018-01-07
8 3 D 2.3 2018-01-10
9 5 D 6.5 2018-01-10
答案 1 :(得分:2)
从Combining lists of different lengths into data frame借用:
close_button.onclick = closeModal = () => {
modal.style.display = 'none';
cropper.destroy();
}
Tidyverse等效项
str(lists <- lapply(df, function(a) sort(unique(a), decreasing=!inherits(a,"Date"))))
# List of 4
# $ var2: int [1:5] 7 5 4 3 2
# $ var1: chr [1:4] "D" "C" "B" "A"
# $ var3: num [1:9] 6.5 4.2 4 3.2 2.43 2.3 1.34 1.2 0.34
# $ var4: Date[1:6], format: "2018-01-03" "2018-01-04" "2018-01-06" "2018-01-07" ...
str(lists <- lists[order(lengths(lists))])
# List of 4
# $ var1: chr [1:4] "D" "C" "B" "A"
# $ var2: int [1:5] 7 5 4 3 2
# $ var4: Date[1:6], format: "2018-01-03" "2018-01-04" "2018-01-06" "2018-01-07" ...
# $ var3: num [1:9] 6.5 4.2 4 3.2 2.43 2.3 1.34 1.2 0.34
(maxlen <- max(lengths(lists)))
# [1] 9
str(lists <- lapply(lists, function(l) c(l, rep(NA, maxlen-length(l)))))
# List of 4
# $ var1: chr [1:9] "D" "C" "B" "A" ...
# $ var2: int [1:9] 7 5 4 3 2 NA NA NA NA
# $ var4: Date[1:9], format: "2018-01-03" "2018-01-04" "2018-01-06" "2018-01-07" ...
# $ var3: num [1:9] 6.5 4.2 4 3.2 2.43 2.3 1.34 1.2 0.34
as.data.frame(lists)
# var1 var2 var4 var3
# 1 D 7 2018-01-03 6.50
# 2 C 5 2018-01-04 4.20
# 3 B 4 2018-01-06 4.00
# 4 A 3 2018-01-07 3.20
# 5 <NA> 2 2018-01-09 2.43
# 6 <NA> NA 2018-01-10 2.30
# 7 <NA> NA <NA> 1.34
# 8 <NA> NA <NA> 1.20
# 9 <NA> NA <NA> 0.34
最重要的是:我同意@ 42-和@thelatemail的观点,这实际上不是最佳的存储格式。对library(dplyr)
library(purrr)
maxlen <- max(lengths(map(df, unique)))
df %>%
map(~ sort(unique(.), decreasing = !inherits(., "Date"))) %>%
.[order(lengths(.))] %>%
map(`length<-`, maxlen) %>% # alternative 1
# map(~ c(., rep(NA, maxlen - length(.)))) %>% # alternative 2
tbl_df()
的一种解释是,一行上的所有内容都是相关的。例如,在调查中,每一列都是一个问题,每一行都是一个受访者(调查者)。通过在列之间进行不同的重新排序,该关联将被完全丢弃。对于唯一的 not 而言,我想到的唯一理由是使用更简单的data.frame
格式(以@ 42-结尾),用于报告演示,我认为您会做类似的事情>
list