检测同义词并将其保存到数据帧

时间:2017-03-19 21:35:36

标签: r

我有一个带有预定义单词/短语的数据框。

实施例: DF $术语

stock
revenue
continuous improvement

和另一个数据框(df2),其中一列有很多行,每行都有一个文本。例 DF2 $句

I used to study at university and in my free time observe the stock prices. Additionally the revenue of every stock
Stock market is my first interest
I always try to continuous improvement

使用df中的术语我想检测每一行的术语,并输出像这样的输出

row_number, stock,  continuous improvement, revenue
1,1,0,1
2,1,0,0
3,0,1,0

有没有简单的方法来制作它?

1 个答案:

答案 0 :(得分:2)

您可以按照以下方式执行此操作:

# Create some fake data
words <- c("stock", "revenue", "continuous improvement")
phrases <- c("blah blah stock and revenue", "yada yada revenue yada", 
             "continuous improvement is an unrealistic goal", 
             "phrase with no match")

# Apply the 'grepl' function along the list of words, and convert the result to numeric
df <- data.frame(lapply(words, function(word) {as.numeric(grepl(word, phrases))}))
# Name the columns the words that were searched
names(df) <- words
df
    stock revenue continuous improvement
1     1       1                      0
2     0       1                      0
3     0       0                      1
4     0       0                      0

我没有在这里创建一个带行号的单独变量,但如果您需要df$row.number <- 1:nrow(df),则可以随时添加。