我的数据:
Topic Content
Sunny "Today is a sunny day."
John He listened and walked away."Should I visit Dr.Mary today?"
May May is playing alone.
我想提取“引用中的所有内容”内的所有内容。
另外,我想创建另一列来为句子命名,例如,如果内容的关键字为“Sunny”,则新列将输入为“Sunny”,如果内容的关键字为“Sunny”访问“,”医院“将输入该行的新栏目。
我想得到以下输出:
Topic Content Tag
Sunny "Today is a sunny day!" Sunny
John "Should I visit Dr.Mary today?" Hospital
dput:
structure(list(Topic = structure(c(3L, 1L, 2L), .Label = c("John",
"May", "Sunny"), class = "factor"), Content = structure(c(3L,
1L, 2L), .Label = c("He listened and walked away.\"Should I visit Dr.Mary today?\"",
"May is playing alone.", "Today is a sunny day. "), class = "factor")), .Names = c("Topic",
"Content"), class = "data.frame", row.names = c(NA, -3L))
答案 0 :(得分:1)
你可以试试这个,
df <-structure(list(Topic = structure(c(3L, 1L, 2L), .Label = c("John",
"May", "Sunny"), class = "factor"), Content = structure(c(3L,
1L, 2L), .Label = c("He listened and walked away.\"Should I visit Dr.Mary today?\"",
"May is playing alone.", "Today is a sunny day. "), class = "factor")), .Names = c("Topic",
"Content"), class = "data.frame", row.names = c(NA, -3L))
x <- df[grepl('"', df$Content),]
x$Content <- sub('.*"(.*)".*', "\\1", x$Content)
x$Tag <- ifelse(grepl("visit",x$Content), "Hospital", ifelse(grepl("sunny",x$Content), "Sunny", ""))
x
# Topic Content Tag
# 2 John Should I visit Dr.Mary today? Hospital