Question

我有一个包含两列的.csv文件。第一个是ID，第二个是文本字段。但是，文本字段中的文本被分成可以运行到另一行的句子，因此文件如下所示：

ID TEXT
TXT_1 This is the first sentence
NA This is the second sentence
NA This is the third sentence
TXT_2 This is the first sentence of the second text
NA This is the second sentence of the second text

我想要做的是合并文本字段，使其看起来像这样：

ID TEXT
TXT_1 This is the first sentence This is the second sentence This is the third sentence
TXT_2 This is the first sentence of the second text This is the second sentence of the second text

在R中有一个简单的解决方案吗？

Answer 1

我们根据“ID”中的非NA元素和{ token, filter: { author_id, id }, limit, offset }“TEXT”一起创建分组变量

paste

或者@Jaap建议

library(dplyr)
df1 %>% 
    group_by(Grp = cumsum(!is.na(ID))) %>% 
    summarise(ID = ID[!is.na(ID)], TEXT = paste(TEXT, collapse = ' ')) %>%
    ungroup() %>%
    select(-Grp)  
# A tibble: 2 x 2
#     ID                                                                                         TEXT
#    <chr>                                                                                        <chr>
#1 TXT_1            This is the first sentence This is the second sentence This is the third sentence
#2 TXT_2 This is the first sentence of the second text This is the second sentence of the second text

如何合并一列中的行以匹配另一列中的非空行？

1 个答案: