在ivot_wider()之后避免在列中使用NA

时间:2020-10-09 10:22:57

标签: r tidyr

是否有可能在Array ( [0] => 2020-11-21 [1] => 2020-11-22 [2] => 2020-11-23 ) Array ( [0] => 2020-10-11 [1] => 2020-10-12 [2] => 2020-10-13 [3] => 2020-10-14 [4] => 2020-10-15 ) 之后将观测值“上移”并在列中的观测值上方移除pivot_wider()?我尝试NA's的列,但这似乎很麻烦。显然,我不受此方法的约束,但希望停留在lag()中。

tidyverse

当前输出如下:

library(tidyverse)

set.seed(1111)

df <- data.frame(
  item = as.numeric(sample(1:20)),
  clust = as.numeric(sample(1:3, 20, replace = TRUE))
)

df %>%
  arrange(clust, item) %>%
  rowid_to_column() %>%
  pivot_wider(names_from = clust, values_from = item, names_prefix = "Cluster_") %>%
  select(-rowid)

所需的输出如下:

# A tibble: 20 x 3
   Cluster_1 Cluster_2 Cluster_3
       <dbl>     <dbl>     <dbl>
 1         3        NA        NA
 2        13        NA        NA
 3        14        NA        NA
 4        15        NA        NA
 5        16        NA        NA
 6        17        NA        NA
 7        19        NA        NA
 8        20        NA        NA
 9        NA         1        NA
10        NA         4        NA
11        NA         6        NA
12        NA         7        NA
13        NA         8        NA
14        NA         9        NA
15        NA        12        NA
16        NA        18        NA
17        NA        NA         2
18        NA        NA         5
19        NA        NA        10
20        NA        NA        11

我知道,这种方法会危害数据集,但这只是出于美学原因,因为随后将小标题导出到LATEX文档中,并且仅有助于可视化群集分组。

2 个答案:

答案 0 :(得分:1)

您可以像这样实现所需的输出:

library(tidyverse)

set.seed(1111)

df <- data.frame(
  item = as.numeric(sample(1:20)),
  clust = as.numeric(sample(1:3, 20, replace = TRUE))
)

df %>%
  arrange(clust, item) %>%
  group_by(clust) %>% 
  mutate(id =row_number()) %>%
  pivot_wider(names_from = clust, values_from = item, names_prefix = "Cluster_") %>%
  select(-id)
#> # A tibble: 8 x 3
#>   Cluster_1 Cluster_2 Cluster_3
#>       <dbl>     <dbl>     <dbl>
#> 1         3         1         2
#> 2        13         4         5
#> 3        14         6        10
#> 4        15         7        11
#> 5        16         8        NA
#> 6        17         9        NA
#> 7        19        12        NA
#> 8        20        18        NA

答案 1 :(得分:0)

这是一种使用split并调整长度的方法。

s <- split(df$item, df$clust)
as.data.frame(lapply(s, function(x) `length<-`(sort(x), max(lengths(s)))))
#   X1 X2 X3
# 1  3  1  2
# 2 13  4  5
# 3 14  6 10
# 4 15  7 11
# 5 16  8 NA
# 6 17  9 NA
# 7 19 12 NA
# 8 20 18 NA