我有一个庞大的数据框,并且我有兴趣查找特定列中所有单词的出现,例如:
Column
Hi my name is Corey!
Hi my name is John
所需的输出:
Hi 2
my 2
name 2
is 2
Corey 1
John 1
我也想排除特殊的字母,例如!在科里!在这个例子中,也像问号,句号等... 任何帮助将不胜感激,谢谢!
答案 0 :(得分:1)
df <- data.frame(column = c('Hi my name is Corey!',
'Hi my name is John'))
df
#column
#1 Hi my name is Corey!
#2 Hi my name is John
all_words <- unlist( # flattten word list from individual strings into one vector
regmatches(df$column, gregexpr('\\w+', df$column))) # extract all words
# count frequencies
freq_count <- table(all_words)
freq_count
#Corey Hi is John my name
#1 2 2 1 2 2