如何删除R中除特殊标点符号外的所有英语单词

时间:2016-02-15 19:43:27

标签: regex r text-mining data-cleansing

我在R中有一个数据文件,

Traceback (most recent call last):

  File "<ipython-input-4-da1414ba5e14>", line 1, in <module>
   runfile('C:/Users/Desktop/Marvel/finding_key_players.py',        wdir='C:/Users/Desktop/Marvel')

  File "C:\Users\Anaconda33\lib\site-   packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
   execfile(filename, namespace)

  File "C:\Users\Anaconda33\lib\site packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Desktop/Marvel/finding_key_players.py", line 70, in
   <module>
      cmap=plt.cm.Reds_r)

  File "C:\Users\Anaconda33\lib\site-packages\networkx\drawing\nx_pylab.py", line 399, in draw_networkx_nodes
label=label)

  File "C:\Users\Anaconda33\lib\site-packages\matplotlib\axes\_axes.py", line 3606, in scatter
colors = mcolors.colorConverter.to_rgba_array(c, alpha)

  File "C:\Users\Anaconda33\lib\site-packages\matplotlib\colors.py",   line 391, in to_rgba_array
if alpha > 1 or alpha < 0:

ValueError: Cannot convert argument type <class 'numpy.ndarray'> to rgba array

从此我想删除所有单词,只有微笑会在那里,以及我期待的输出,

data <- "conflict need resolved :<  turned conversation exchange ideas richer environment one tricky concepts :D    conflict  always top business agendas :>  maybe different ideas opinions different :)" 

R中是否有任何库或方法可以轻松完成此任务?

1 个答案:

答案 0 :(得分:1)

您可以使用[[:alnum:]]作为字符串的所有数字和字母数字字符的正则表达式模式

s <- gsub("[[:alnum:]]*", "", "conflict need resolved :<  turned conversation exchange ideas richer environment one tricky concepts :D    conflict  always top business agendas :>  maybe different ideas opinions different :) ")
gsub(" +", " ", s)

[1] " :< : :> :) "