如何填充空行并在r中组合分割线

时间:2018-06-06 03:59:56

标签: r dataframe

我有一个来自pdf的数据框和一些应该在一行中的文本,现在跨越不同数量的行,如下所示:

df_missing = data.frame(group = c("East","","","West","","",""), 
                        order = c("this","is supposed to be","one line","this","is supposed to be","one line","too"))

如何更正数据框以折叠分割线

df_correct = data.frame(group = c("East","West"), order = c("this is supposed to be one line", "this is supposed to be one line too"))

2 个答案:

答案 0 :(得分:1)

我们可以通过多种方式实现这一目标。一种方法是通过基于“组”中的非空白元素和summarise“{顺序”中的非空白元素的逻辑向量的累积总和来创建组

paste

或者,不是创建新的分组列,而是使用library(dplyr) df_missing %>% group_by(group1 = cumsum(group != "")) %>% summarise(group = first(group), order = paste(order, collapse= ' ')) %>% select(-group1) # A tibble: 2 x 2 # group order # <fct> <chr> #1 East this is supposed to be one line #2 West this is supposed to be one line too 作为索引来填充'group'中的cumsum非空白元素

unique

另一种选择是将空白更改为df_missing %>% group_by(group = unique(group[group!=""])[cumsum(group != "")]) %>% summarise(order = paste(order, collapse=' ')) ,然后将NA更改为非NA前置值,按“组”分组,fill'顺序如上

paste

答案 1 :(得分:0)

类似的概念,如@akrun

data.table解决方案:

library(data.table)
setDT(df_missing)[,.(group=group[1], order = paste(order, collapse= ' ')),by=cumsum(group != "")][,-1]

#   group                               order
#1:  East     this is supposed to be one line
#2:  West this is supposed to be one line too