Question

更新：

上面的数据并没有真正反映我的数据，因此这里是一个更新：

tag <- c("\\ID", "\\a", "\\b", "\\ID", "\\b", "\\ID", "\\a", "\\b", "\\ID", "\\ID") content <- c("ID_x", "text2", "text3", "ID_y", "text6", "ID_z", "text7", "text8", "ID_f", "ID_g") df <- as.data.frame(cbind(tag, content))

我需要：

\ID \a \b ID_x text2 text3 ID_y text6 ID_z text7 text8 ID_f ID_g

因此唯一的ID_并没有全部填充两个变量\ a和\ b。

我尝试了unstack，也尝试了aggreagte，但没有成功

Answer 1

如果我们需要tag作为列名，请创建一个'tag'序列列（{{1}中的rowid），然后在data.table中使用该列重塑为'宽”

dcast

更新

基于更新，我们可能需要创建一个新列“ ind”以标记“ ID”的出现

library(data.table)
dcast(setDT(df), rowid(tag) ~ tag, value.var = 'content')

对于所示示例，setDT(df)[, cumsum(tag == "\\ID")] dcast(df, ind ~ tag, value.var = 'content') # ind \\a \\b \\ID #1: 1 text2 text3 ID_x #2: 2 <NA> text6 ID_y #3: 3 text7 text8 ID_z #4: 4 <NA> <NA> ID_f #5: 5 <NA> <NA> ID_g也可以工作，因为没有重复

unstack

Answer 2

修改后的问题在哪里：

df <- data.frame(tag = c("\\ID", "\\a", "\\b", "\\ID",  "\\b", "\\ID", "\\a", "\\b", "\\ID", "\\ID"), 
                 content = c("ID_x", "text2", "text3", "ID_y",  "text6", "ID_z", "text7", "text8", "ID_f", "ID_g"),
                 stringsAsFactors = FALSE)

最困难的一点是以某种方式按ID对行进行分组。我的解决方案使用fill库中的tidyr在data.frame中向下传播一个值。

library("dplyr")

df %>%
  # Create a proper id column
  mutate(id = ifelse(tag == "\\ID", content, NA)) %>%
  # fill all ids based on the last id observed
  tidyr::fill(id) %>%
  # format the data in the desired shape
  tidyr::spread(tag, content) %>%
  # discarding our now redundant id column and re-arranging columns
  select(-id) %>%
  select(`\\ID`, everything())

结果：

#   \\ID   \\a   \\b
# 1 ID_f  <NA>  <NA>
# 2 ID_g  <NA>  <NA>
# 3 ID_x text2 text3
# 4 ID_y  <NA> text6
# 5 ID_z text7 text8

我认为NA最有意义，但是，如果您想要其他东西，则可以简单地将fill = ""传递给tidyr::spread，以使用其他默认值（例如，空字符串""。

#   \\ID   \\a   \\b
# 1 ID_f            
# 2 ID_g            
# 3 ID_x text2 text3
# 4 ID_y       text6
# 5 ID_z text7 text8

根据标签重塑列表

2 个答案:

更新