Python - 如何通过空格分隔标点符号,在标点符号和单词之间只留下一个空格?

时间:2015-01-07 03:11:52

标签: python regex

我有以下字符串:

input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

所有标点符号都应与" /","之外的单词分开。 ' "," - "," +"和" $"。

所以输出应该是:

"I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10. It's free-to-use , no $$$ involved !"

我使用了以下代码:

for x in string.punctuation:
    if x == "/":
        continue
    if x == "'":
        continue
    if x == "-":
        continue
    if x == "+":
        continue
    if x == "$":
        continue
    input = input.replace(x," %s " % x)

我得到以下输出:

I love programming with Python-3 . 3 !  Do you ?  It's great .  .  .  I give it a 10/10 .  It's free-to-use ,  no $$$ involved ! 

它有效,但问题是它有时会在标点符号和单词之间留下两个空格,例如在句子中的第一个感叹号和单词" Do"之间。这是因为它们之间已经存在空间。

这个问题也会发生在:input =" Hello。 (HI)&#34 ;.输出将是:

" Hello .  ( hi ) "

请注意开括号前的两个空格。

我需要在任何标点符号和单词之间只有一个空格的输出,除了上面提到的5个标点符号,它们没有与单词分开。我怎样才能解决这个问题?或者,有没有更好的方法来使用正则表达式?

提前致谢。

5 个答案:

答案 0 :(得分:6)

看起来re可以为你做这件事......

>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-    to-use , no $$$ involved ! "

>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", "Hello. (hi)")
'Hello . ( hi ) '

如果尾随空格有问题,.rtrim(theresult, ' ')应该为您解决: - )

答案 1 :(得分:0)

我可以尝试这种方式:

>>> import string
>>> input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
>>> ls = []
>>> for x in input:
...     if x in string.punctuation:
...         ls.append(' %s' % x)
...     else:
...         ls.append(x)
...
>>> ''.join(ls)
"I love programming with Python -3 .3 ! Do you ? It 's great . . . I give it a 10 /10 . It 's free -to -use , no  $ $ $ involved !"
>>>

答案 2 :(得分:0)

由于缺乏声誉而无法评论,但在这种情况下

  

在句子中的第一个感叹号和单词" Do"

之间

看起来有两个空格,因为之间已经有空格了!和做

  

!   做

所以,如果在标点符号后面已经有空格,请不要再放一个空格。

此外,这里也有类似的问题:python regex inserting a space between punctuation and letters

所以也许可以考虑使用re

答案 3 :(得分:0)

在我看来,否定的字符类更简单:

import re

input_string = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

print re.sub(r"\s?([^\w\s'/\-\+$]+)\s?", r" \1 ", input_string)

输出:

I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-to-use , no $$$ involved ! 

答案 4 :(得分:0)

# Approach 1

import re

sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

sample_input = re.sub(r"([^\s])([^\w\/'+$\s-])", r'\1 \2', sample_input)
print(re.sub(r"([^\w\/'+$\s-])([^\s])", r'\1 \2', sample_input))

# Approach 2

import string

sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

punctuation = string.punctuation.replace('/', '').replace("'", '') \
        .replace('-', '').replace('+', '').replace('$', '')

i = 0

while i < len(sample_input):
    if sample_input[i] not in punctuation:
        i += 1
        continue

    if i > 0 and sample_input[i-1] != ' ':
        sample_input = sample_input[:i] + ' ' + sample_input[i:]
        i += 1

    if i + 1 < len(sample_input) and sample_input[i+1] != ' ':
        sample_input = sample_input[:i+1] + ' ' + sample_input[i+1:]
        i += 1

    i += 1

print(sample_input)