Python - 从长字符串中删除大写单词

时间:2016-06-21 11:49:25

标签: python string

我有长串(28MB)的普通句子。我想删除所有大写字母的单词(如TNT,USA,OMG)。

所以从发送:

Jump over TNT in There.

我想得到:

Jump over  in There.

有没有办法,如何在不将文本拆分成列表并迭代的情况下执行此操作?是否有可能以某种方式使用正则表达式?

3 个答案:

答案 0 :(得分:4)

您可以使用以字边界[A-Z]捕获的大写字母\b

import re

line = 'Jump over TNT in There NOW'

m = re.sub(r'\b[A-Z]+\b', '', line)
#'Jump over  in There '

答案 1 :(得分:2)

使用模块re

import re

line = 'Jump over TNT in There.'
new_line = re.sub(r'[A-Z]+(?![a-z])', '', line)

print(new_line)
# Output
Jump over  in There.

答案 2 :(得分:1)

我会做这样的事情:

import string

def onlyUpper(word):
    for c in word:
        if not c.isupper():
            return False
    return True

s = "Jump over TNT in There."
for char in string.punctuation:
    s = s.replace(char, ' ')

words = s.split()
good_words = []

for w in words:
    if not onlyUpper(w):
        good_words.append(w)

result = ""
for w in good_words:
    result = result + w + " "

print result