删除单词和撇号之间的额外空格

时间:2017-05-04 06:11:42

标签: python string text-mining

我有一个包含动词收缩的字符串列表。 我的清单是这样的:

["What 's your name?", "Isn 't it beautiful?",...]

我想删除单词和撇号之间的空格,因此新列表将是:

["What's your name?", "Isn't it beautiful?",...]

我使用replace(),但列表包含5500个字符串,其中有不同形式的收缩。以下代码只是替换了一种形式的收缩。

s = s.replace("'s","is")

如何删除单词和撇号之间的额外空格?

3 个答案:

答案 0 :(得分:0)

这应该这样做:

l = ["What 's your name?", "Isn 't it beautiful"]
lNew = [i.replace(" '","'") for i in l]

这给出了:

lNew = ["What's your name?", "Isn't it beautiful"]

您似乎在使用撇号和字符串相同的符号,但我确信在您的程序中它们是不同的。

这有帮助吗?

答案 1 :(得分:0)

您可以通过这种方式尝试使用正则表达式。(这将在更多的空格中提供帮助,但不会像评论中提到的那样为do n ot提供帮助。)

import re s = ["What 's your name?","Isn 't it beautiful?"] s = [re.sub(r'\s+\'', "'", i) for i in s]

输出将是 >>> s ["What's your name?", "Isn't it beautiful?"]

答案 2 :(得分:0)

(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])

你可以尝试一下。参见演示。

https://regex101.com/r/18GHqw/1

import re

regex = r"(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])"

test_str = ("'What 's your name?','Isn 't it beautiful?'\n\n"
"Jesus ' cross\"\n"
"do n't\"\n"
"sdsda   sdsd'  sdsd")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.