如何在R中删除以冒号结尾的文本模式?

时间:2019-07-12 13:34:44

标签: r regex gsub

我有以下一句话

review <- C("1a. How long did it take for you to receive a personalized response to an internet or email inquiry made to THIS dealership?: Approx. It was very prompt however. 2f. Consideration of your time and responsiveness to your requests.: Were a little bit pushy but excellent otherwise 2g. Your satisfaction with the process of coming to an agreement on pricing.: Were willing to try to bring the price to a level that was acceptable to me. Please provide any additional comments regarding your recent sales experience.: Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! ")

我想删除之前的所有内容:

我尝试了以下代码

gsub("^[^:]+:","",review)

但是,它只删除了以冒号结尾的第一句话

预期结果:

Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me. Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)!

任何帮助或建议将不胜感激。谢谢。

1 个答案:

答案 0 :(得分:2)

如果句子不复杂且没有缩写,则可以使用

gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)

请参见regex demo

请注意,您可以通过将\\d+[a-zA-Z]更改为[0-9a-zA-Z]+ / [[:alnum:]]+以匹配1个以上的数字或字母来进一步概括一下。

详细信息

  • (?:\d+[a-zA-Z]\.)?-的可选序列
    • \d+-1个以上数字
    • [a-zA-Z]-ASCII字母
    • \.-一个点
  • [^.?!:]*-除.?!:
  • 以外的0个或更多字符
  • [?!.]-a ?!.
  • :-冒号
  • \s*-超过0个空格

R测试:

> gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
[1] "Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me.Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! "

扩展为缩写

如果添加轮换,则可以列举例外情况:

gsub("(?:\\d+[a-zA-Z]\\.)?(?:i\\.?e\\.|[^.?!:])*[?!.]:\\s*", "", review)     
                          ^^^^^^^^^^^^^^^^^^^^^^ 

在这里,(?:i\.?e\.|[^.?!:])*匹配0个或多个ie.i.e.子字符串或.?!或{ {1}}。

请参见this demo