来自多个变量的 Pivot_wider 函数(tidyr r 包)

时间:2021-05-13 19:23:20

标签: r pivot tidy

我想将数据框放在宽格式中,考虑两个变量作为标准(甚至可能是不必要的)。但我对此发表评论,因为原始 df 是 480 行和几个子级别。

这是返回错误!

library(tidyr)
library(dplyr)
                                                                
df <- structure(list(ID = c(1, 2, 3, 4), Gender = c("Men", "Women", "Men", 
"Women"), Country = c("Austria", "Austria", "Austria", "Austria"
), Season_ID = c("2011", "2012", "2011", "2012"), Region_UN = c("A", 
"B", "A", "B")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

df_wide <- df %>%
  pivot_wider(names_from = Gender,
              values_from = Region_UN,
              id_cols = c(Country, Season_ID))

警告信息: 值不是唯一标识的;输出将包含列表列。

  • 使用 values_fn = list 取消此警告。
  • 使用 values_fn = length 确定重复出现的位置
  • 使用 values_fn = {summary_fun} 汇总重复项

我不知道我可以在 values_fn 中放入哪个参数!

2 个答案:

答案 0 :(得分:3)

我们可以创建一个序列列

library(dplyr)
library(tidyr)
library(data.table)
df %>% 
  mutate(ID = NULL, rn = rowid(Country, Season_ID)) %>%     
  pivot_wider(names_from = Gender,
          values_from = Region_UN,
          id_cols = c(rn, Country, Season_ID))

答案 1 :(得分:3)

您也可以将其粘贴在一起:

../bin/flink run flink-andy-12.3.0.jar --savepointPath file:/{...}/savepoint-f74c92-6acdb05afd11

因为两者也一样:

df_wide <- df %>%
  pivot_wider(names_from = Gender,
              values_from = Region_UN,
              id_cols = c(Country, Season_ID),
              values_fn = function(x) paste(x, collapse=","))

df_wide