我有一个包含两列的.csv文件。第一个是ID,第二个是文本字段。但是,文本字段中的文本被分成可以运行到另一行的句子,因此文件如下所示:
ID TEXT
TXT_1 This is the first sentence
NA This is the second sentence
NA This is the third sentence
TXT_2 This is the first sentence of the second text
NA This is the second sentence of the second text
我想要做的是合并文本字段,使其看起来像这样:
ID TEXT
TXT_1 This is the first sentence This is the second sentence This is the third sentence
TXT_2 This is the first sentence of the second text This is the second sentence of the second text
在R中有一个简单的解决方案吗?
答案 0 :(得分:1)
我们根据“ID”中的非NA元素和{
token,
filter: {
author_id,
id
},
limit,
offset
}
“TEXT”一起创建分组变量
paste
或者@Jaap建议
library(dplyr)
df1 %>%
group_by(Grp = cumsum(!is.na(ID))) %>%
summarise(ID = ID[!is.na(ID)], TEXT = paste(TEXT, collapse = ' ')) %>%
ungroup() %>%
select(-Grp)
# A tibble: 2 x 2
# ID TEXT
# <chr> <chr>
#1 TXT_1 This is the first sentence This is the second sentence This is the third sentence
#2 TXT_2 This is the first sentence of the second text This is the second sentence of the second text