我有长串(28MB)的普通句子。我想删除所有大写字母的单词(如TNT,USA,OMG)。
所以从发送:
Jump over TNT in There.
我想得到:
Jump over in There.
有没有办法,如何在不将文本拆分成列表并迭代的情况下执行此操作?是否有可能以某种方式使用正则表达式?
答案 0 :(得分:4)
您可以使用以字边界[A-Z]
捕获的大写字母\b
:
import re
line = 'Jump over TNT in There NOW'
m = re.sub(r'\b[A-Z]+\b', '', line)
#'Jump over in There '
答案 1 :(得分:2)
使用模块re
,
import re
line = 'Jump over TNT in There.'
new_line = re.sub(r'[A-Z]+(?![a-z])', '', line)
print(new_line)
# Output
Jump over in There.
答案 2 :(得分:1)
我会做这样的事情:
import string
def onlyUpper(word):
for c in word:
if not c.isupper():
return False
return True
s = "Jump over TNT in There."
for char in string.punctuation:
s = s.replace(char, ' ')
words = s.split()
good_words = []
for w in words:
if not onlyUpper(w):
good_words.append(w)
result = ""
for w in good_words:
result = result + w + " "
print result