如何在字符串中的用户名后添加逗号,以便我可以消除逗号前的单词,以便获得可用于精确匹配的统一字符串。
a=dataframe(text=c("hi john what are you doing",
"hi sunil what are you doing",
"hello sanjay what are you doing"),stringsAsFactors =FALSE)
答案 0 :(得分:1)
如果您知道用户名在句子中位于第二位,则可以从DF中提取句子并使用它:
text=c("hi john what are you doing",
"hi sunil what are you doing",
"hello sanjay what are you doing")
for (sentence in text) {
#separate words in sentence
spl <- strsplit(sentence," ")
#extract name and converto to uppercase
name <- toupper(as.character(spl[[1]])[2])
#put a comma after name
name2 <- paste(name, ",", sep="")
#replace original name with new one
spl[[1]][2] <- name2
#loop over the sentence words to recretae the sentence
for ( i in 1:length(spl[[1]])-1 ) {
if (i == 1) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
else sentence2 <- paste(sentence2, spl[[1]][i+1])
}
#put in new list (text2)
if (sentence == text[1]) text2 <- c(sentence2)
else text2 <- append( text2, sentence2 )
}
结果:
#text2
#[1] "hi JOHN, what are you doing" "hi SUNIL, what are you doing"
#[3] "hello SANJAY, what are you doing"
然后重新创建数据框。
否则,如果您的用户名在句子中的位置可能有所不同,但是您有需要查找的用户名列表,则还可以检查是否找到至少一个用户名,将用户名在句子中的位置替换,替换,放入逗号,然后重新创建,或者如果找不到则打印错误。
usernames <- c("john", "sunil", "sanjay")
text=c("hi john what are you doing",
"hi sunil what are you doing",
"hello sanjay what are you doing",
"hello ciao how are you"
)
for (sentence in text) {
user_present <- NA
#separate words in sentence
spl <- strsplit(sentence," ")
#check if a user is present in the sentence
for (user in usernames) {
if ( user %in% spl[[1]]) {
user_present <- user
break
}}
#if at least one user is found
if ( !is.na(user_present) ) {
pos <- which( spl[[1]] == user_present )
#extract name and converto to uppercase
name <- toupper(as.character(spl[[1]])[pos])
#put a comma after name
name2 <- paste(name, ",", sep="")
#replace original name with new one
spl[[1]][2] <- name2
#loop over the sentence words to recretae the sentence
for ( i in 1:length(spl[[1]])-1 ) {
if (i == 0) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
else sentence2 <- paste(sentence2, spl[[1]][i+1])
}
#put in new list (text2)
if (sentence == text[1]) text2 <- c(sentence2)
else text2 <- append( text2, sentence2 )
#if NO username in sentence
} else {
#print error message with username and sentence in which not found
err.msg <- paste("NO username found in sentence: ", sentence)
print(err.msg)
}
}
结果:
#[1] "NO username found in sentence: hello ciao how are you"
text2
#[1] " hi JOHN, what are you doing" " hi SUNIL, what are you doing"
#[3] " hello SANJAY, what are you doing"
希望有帮助!
###END
答案 1 :(得分:0)
两个解决方案。
首先,如果您可以获取包含用户名的列表。
usernames <- c("john", "sunil", "sanjay")
diag(sapply(usernames, function(x) gsub(x, paste0(x, ","), a$text)))
# [1] "hi john, what are you doing" "hi sunil, what are you doing" "hello sanjay, what are you doing"
或者,如果用户名始终是第二个单词。
gsub("(^\\w*\\s)(\\w*)", "\\1\\2,", a$text)
# [1] "hi john, what are you doing" "hi sunil, what are you doing" "hello sanjay, what are you doing"
数据
a <- structure(list(text = c("hi john what are you doing", "hi sunil what are you doing",
"hello sanjay what are you doing")), class = "data.frame", row.names = c(NA,
-3L))