我想分出一个字母后面的句号;因此t.
成为t
& p.m.
变为pm
。如果发生这种情况>连续1次我想删除之间发生的空格;因此e. g.
变为eg
。如果single letter + period
连续出现超过1次,然后是1-2个空格,则除非资本后跟一段时间,否则保留期限。
x <- "Mr. Brown comes! I met at 7:30 p. m. I will go at 5 a.m. eastern time or @ 2 p. m. I live in the U. S. A. I met John P. Jones later."
# my attempts
gsub("(?<=(\\b[A-Za-z]))(\\.)(?! {1,2}[A-Z])", "", x, perl = TRUE)
gsub("(?<=(\\b[A-Za-z]))(\\. )(?! ??[A-Z])", "", x, perl = TRUE)
"Mr. Brown comes! I met at 7:30 pm. I will go at 5 am eastern time or @ 2 pm. I live in the USA. I met John P Jones later."
答案 0 :(得分:3)
试试这个正则表达式:
s = old_s.gsub /[ \n]/, ''
对于R使用:
(?:(?<=[a-z])\.\s(?=[a-z]\.))|(?:(?<=[a-z])\.)(?!(?:\s[A-Z]|$)|(?:\s\s))|(?:(?<=[A-Z])\.\s(?=[A-Z]\.))|(?:(?<=[A-Z])\.(?=\s[A-Z][A-Za-z]))