合并具有重复标识符的行,同时添加其他列

时间:2019-12-02 05:17:57

标签: r dataframe excel-formula data-cleaning

这是我正在寻找的简单示例:

之前:

data.frame(
  Name = c("pusheen", "pusheen", "puppy"),
  Species = c("feline", "feline", "doggie"),
  Activity = c("snacking", "napping", "playing"),
  Start = c(1, 2, 3),
  End = c(11, 12, 13)
)

之后:

data.frame(
  Name = c("pusheen", "puppy"),
  Species = c("feline", "doggie"),
  Activity1 = c("snacking", "playing"),
  Start1 = c(1, 3),
  End1 = c(11, 13),
  Activity2 = c("napping", NA),
  Start2 = c(2, NA),
  End2 = c(12, NA)
)

如何在R或Excel中执行此操作?谢谢!

1 个答案:

答案 0 :(得分:1)

这可以使用pivot_wider软件包中的tidyr完成。

library(tidyr)
library(dplyr)
library(magrittr)

df <- df %>% 
  group_by(Name) %>% 
  mutate(num = row_number()) %>% # Create a counter by group
  ungroup() %>%
  pivot_wider(
    id_cols = c("Name", "Species"), 
    names_from = num, 
    values_from = c("Activity", "Start", "End"), 
    names_sep = "")

如果要按照示例输出中的顺序对结果进行排序,我们可以添加其他select语句。我使用了stringr包中的str_sub从每个列名中提取最后一个字符,然后从那里对名称进行排序。这种对列进行排序的方法应适用于任何数量的活动。

library(stringr)

df %>% 
  select(Name, Species, names(df)[order(str_sub(names(df), -1))])