我有一个带有预定义单词/短语的数据框。
实施例: DF $术语
stock
revenue
continuous improvement
和另一个数据框(df2),其中一列有很多行,每行都有一个文本。例 DF2 $句
I used to study at university and in my free time observe the stock prices. Additionally the revenue of every stock
Stock market is my first interest
I always try to continuous improvement
使用df中的术语我想检测每一行的术语,并输出像这样的输出
row_number, stock, continuous improvement, revenue
1,1,0,1
2,1,0,0
3,0,1,0
有没有简单的方法来制作它?
答案 0 :(得分:2)
您可以按照以下方式执行此操作:
# Create some fake data
words <- c("stock", "revenue", "continuous improvement")
phrases <- c("blah blah stock and revenue", "yada yada revenue yada",
"continuous improvement is an unrealistic goal",
"phrase with no match")
# Apply the 'grepl' function along the list of words, and convert the result to numeric
df <- data.frame(lapply(words, function(word) {as.numeric(grepl(word, phrases))}))
# Name the columns the words that were searched
names(df) <- words
df
stock revenue continuous improvement
1 1 1 0
2 0 1 0
3 0 0 1
4 0 0 0
我没有在这里创建一个带行号的单独变量,但如果您需要df$row.number <- 1:nrow(df)
,则可以随时添加。