我怎样才能使用R'正则表达式来消除句号之前的空格,除非句号后跟一个数字?
以下是我拥有的以及我尝试过的内容:
x <- c("I have .32 dollars AKA 32 cents . ",
"I have .32 dollars AKA 32 cents . Hello World .")
gsub("(\\s+)(?=\\.+)", "", x, perl=TRUE)
gsub("(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)
这会给{在.32
之前没有空格):
## [1] "I have.32 dollars AKA 32 cents. "
## [2] "I have.32 dollars AKA 32 cents. Hello World."
我想:
## [1] "I have .32 dollars AKA 32 cents. "
## [2] "I have .32 dollars AKA 32 cents. Hello World."
我在这里背负着gsub
但其他解决方案受到欢迎,以使这个问题对未来的搜索者更有用。
答案 0 :(得分:4)
您不需要复杂的表达式,您可以在此处使用正向前瞻。
> gsub(' +(?=\\.(?:\\D|$))', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "
## [2] "I have .32 dollars AKA 32 cents. Hello World."
<强>解释强>:
+ # ' ' (1 or more times)
(?= # look ahead to see if there is:
\. # '.'
(?: # group, but do not capture:
\D # non-digits (all but 0-9)
| # OR
$ # before an optional \n, and the end of the string
) # end of grouping
) # end of look-ahead
注意:如果这些空格字符可以是任何类型的空白,只需将' '+
替换为\s+
如果您对使用(*SKIP)(*F)
回溯动词感到满意,这里有正确的表示形式:
> gsub(' \\.\\d(*SKIP)(*F)| +(?=\\.)', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "
## [2] "I have .32 dollars AKA 32 cents. Hello World."
答案 1 :(得分:3)
好吧,我不知道r,但我知道正则表达式。希望这个答案适用于r。
gsub("\\s+\\.(?!\\d)", ".", x, perl=TRUE)
它使用负前瞻来确保空格和句点后面没有数字;然后它只用一段时间取代匹配。
答案 2 :(得分:2)
这似乎适用于这个例子。
gsub("\\s(?=\\.[0-9])(*SKIP)(*F)|(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)
#[1] "I have .32 dollars AKA 32 cents. "
#[2] "I have .32 dollars AKA 32 cents. Hello World."
答案 3 :(得分:2)
试试这个正则表达式:
x <- c("I have .32 dollars AKA 32 cents . ",
"I have .32 dollars AKA 32 cents . Hello World .",
"I have .32 dollars AKA 32 cents . Hello World .xyz")
gsub(" *\\.($|\\D)", "\\.\\1", x)
[1] "I have .32 dollars AKA 32 cents. "
[2] "I have .32 dollars AKA 32 cents. Hello World."
[3] "I have .32 dollars AKA 32 cents. Hello World.xyz"
它的作用:
" *\\."
搜索任意数量的空格,后跟一段时间。"($|\\D)"
搜索以下任一项:
$
),\\D
)