除非后面跟一个数字,否则在句号之前消除空间

时间:2014-08-22 12:26:43

标签: regex r

我怎样才能使用R'正则表达式来消除句号之前的空格,除非句号后跟一个数字?

以下是我拥有的以及我尝试过的内容:

x <- c("I have .32 dollars AKA 32 cents . ", 
    "I have .32 dollars AKA 32 cents .  Hello World .")

gsub("(\\s+)(?=\\.+)", "", x, perl=TRUE)
gsub("(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)

这会给{在.32之前没有空格):

## [1] "I have.32 dollars AKA 32 cents. "             
## [2] "I have.32 dollars AKA 32 cents.  Hello World."

我想:

## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."

我在这里背负着gsub但其他解决方案受到欢迎,以使这个问题对未来的搜索者更有用。

4 个答案:

答案 0 :(得分:4)

您不需要复杂的表达式,您可以在此处使用正向前瞻

> gsub(' +(?=\\.(?:\\D|$))', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."

<强>解释

 +        # ' ' (1 or more times)
(?=       # look ahead to see if there is:
  \.      #   '.'
  (?:     #   group, but do not capture:
    \D    #      non-digits (all but 0-9)
   |      #     OR
    $     #      before an optional \n, and the end of the string
  )       #   end of grouping
)         # end of look-ahead

注意:如果这些空格字符可以是任何类型的空白,只需将' '+替换为\s+


如果您对使用(*SKIP)(*F)回溯动词感到满意,这里有正确的表示形式:

> gsub(' \\.\\d(*SKIP)(*F)| +(?=\\.)', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."

答案 1 :(得分:3)

好吧,我不知道r,但我知道正则表达式。希望这个答案适用于r。

gsub("\\s+\\.(?!\\d)", ".", x, perl=TRUE)

它使用负前瞻来确保空格和句点后面没有数字;然后它只用一段时间取代匹配。

答案 2 :(得分:2)

这似乎适用于这个例子。

  gsub("\\s(?=\\.[0-9])(*SKIP)(*F)|(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)
  #[1] "I have .32 dollars AKA 32 cents. "             
  #[2] "I have .32 dollars AKA 32 cents.  Hello World."

答案 3 :(得分:2)

试试这个正则表达式:

x <- c("I have .32 dollars AKA 32 cents . ", 
       "I have .32 dollars AKA 32 cents .  Hello World .",
       "I have .32 dollars AKA 32 cents .  Hello World .xyz")

gsub(" *\\.($|\\D)", "\\.\\1", x)
[1] "I have .32 dollars AKA 32 cents. "                
[2] "I have .32 dollars AKA 32 cents.  Hello World."   
[3] "I have .32 dollars AKA 32 cents.  Hello World.xyz"

它的作用:

  • " *\\."搜索任意数量的空格,后跟一段时间。
  • "($|\\D)"搜索以下任一项:
    • 该行的结尾($),
    • 或“不是数字”(\\D