如何从CSV文件中删除POS标签

时间:2019-03-16 22:39:50

标签: python regex

我正在做一些自然语言处理,其中生成了以下输出:

connect^NN - appears^VBZ cant^JJ lose^JJ make^VBP pretty^JJ pro^JJ make^JJ compared^VBN made^VBD tracked^VBD navigate^JJ click^JJ kept^VBD trail^JJ downloaded^VBD
gps^NN - hope^VBP happy^JJ appears^VBZ entire^JJ reading^VBG good^VB start^VBP eg^JJ negative^JJ crashed^VBD happens^VBZ save^JJ expect^VBP certain^JJ drain^VBP
app^NN - nt^VB go^VBP see^VB relate^JJ pervious^JJ

我需要编写一个脚本来摆脱所有POS标签,例如^ NN,^ VBZ,^ JJ,^ VBP,从而实现以下输出:

 connect - appears cant lose make pretty pro make compared made tracked navigate click kept trail downloaded
  app - nt go see relate pervious

1 个答案:

答案 0 :(得分:2)

假设每个POS标签都以'^'字符开头并以空格字符结尾,则可以使用以下正则表达式:

import re
re.sub('\^.*?\s', ' ', string)

enter image description here