我遇到了以下问题:我有一个文本,由章节分隔并由向量存储。假设像:
text <- c("Here are information about topic1.",
"Here are some information about topic2 or topic3.",
"Chapter number 4 is really annoying.",
"Topic4 is discussed in this chapter.")
我想提取不同章节中提到的不同主题。所以我的输出应该是这样的:
output
[1] [2]
[1] "topic1"
[2] "topic2" "topic3"
[3]
[4] "topic3"
所以我有一些行有多个发现,有些没有匹配。
我尝试使用str_extract_all并取消列表列表,但遇到导致行元素数量不同的问题。
感谢所有人!
答案 0 :(得分:4)
您可以使用rbind.fill.matrix
中的plyr
。
text <- c("Here are information about topic1.",
"Here are some information about topic2 or topic3.",
"Chapter number 4 is really annoying.",
"Topic4 is discussed in this chapter.")
library(stringr)
library(plyr)
xy <- str_extract_all(text, pattern = "[Tt]opic\\d+")
xy <- sapply(xy, FUN = function(x) matrix(x, nrow = 1))
rbind.fill.matrix(xy) # from plyr
1 2
[1,] "topic1" NA
[2,] "topic2" "topic3"
[3,] NA NA
[4,] "Topic4" NA