R - 替换文本中的单词/短语

时间:2018-04-25 11:05:54

标签: r

我有.csv文件,我在其中创建了一个自定义字典来替换单词,我将其作为R中的数据框上传,例如:

word         replacement
Hello        Hi
Good         Best
Good Night   Sweet Morning

我想要做的是扫描.csv中的文本并扫描每个单元格,如果它包含我的自定义词典中的任何单词或短语,则用替换词替换该单词或短语。

请帮我处理代码,我是R的新手。

1 个答案:

答案 0 :(得分:0)

#Dictionary data frame
dict <- data.frame( original = c("word", "Hello","Good","Good Night"),         
                          replace = c("replacement", "Hi", "Best", "Sweet Morning"), 
                          stringsAsFactors=FALSE)

dict
#     original       replace
# 1       word   replacement
# 2      Hello            Hi
# 3       Good          Best
# 4 Good Night Sweet Morning

# Data frame where the words need to be replaced
df <- data.frame ( col1 = c( "Hello", "World", "Good","coffee"),
                   col2 = c("Good Night","To all my friends","I have no","word"), 
                   stringsAsFactors=FALSE)

df
#    col1              col2
#1  Hello        Good Night
#2  World To all my friends
#3   Good         I have no
#4 coffee              word

apply(df,
      MARGIN=c(1,2),
      FUN=function(x){ pos=which(dict[,1] == x); 
                                 if(length(pos)>0) return(dict[pos[1],2]) else return(x)})

#        col1     col2               
#[1,] "Hi"     "Sweet Morning"    
#[2,] "World"  "To all my friends"
#[3,] "Best"   "I have no"        
#[4,] "coffee" "replacement"      
相关问题