"10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"
我想要做的是在python中读取这个字符串并分开连接的单词。我真正想要的是一个正则表达式来分隔字符串中的连接词。
我想从文件中读取上面的字符串,输出应该如下:
"10 JAN 2015 AirMail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide. Also calculated: Nil Action Taken: Goods referred to USG for further action. Attachments : Nil 60 FEB 2004."
(将加入的单词分开)
我需要编写一个正则表达式来分隔:
'10Jan2015AirMail', 'HyderabadAddress', 'details:John', 'DriveAdelaide'
需要一个正则表达式来识别上面连接的单词,并用相同字符串中的空格分隔它们,如
'10 Jan 2015 AirMail, 'Hyderabad Address', 'details : John'
text = open('C:\sample.txt', 'r').read().replace("\n","").replace("\t","").replace("-","").replace("/"," ")
newtext = re.sub('[a-zA-Z0-9_:]','',text) #This regex does not work.Please assist
print text
print newtext
以上代码不起作用
答案 0 :(得分:0)
我知道这个解决方案可以在集合中更简单地对字符进行分类(上限,下限,数字),但我更喜欢做一个更冗长的解决方案:
test_text = "10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"
splitted_text = test_text.split(' ')
num = False
low = False
upp = False
result = []
for word in ss:
new_word = ''
if not word.isupper() and not word.islower():
if word[0].isnumeric():
num = True
low = False
upp = False
elif word[0].islower():
num = False
low = True
upp = False
elif word[0].isupper():
num = False
low = False
upp = True
for letter in word:
if letter.isnumeric():
if num:
new_word += letter
else:
new_word += ' ' + letter
low = False
upp = False
num = True
elif letter.islower():
if low or upp:
new_word += letter
else:
new_word += ' ' + letter
low = True
upp = False
num = False
elif letter.isupper():
if low or num:
new_word += ' ' + letter
else:
new_word += letter
low = False
upp = True
num = False
else:
new_word += ' ' + letter
result.append(''.join(new_word))
else:
result.append(word)
' '.join(result)
#'10 JAN 2015 Air Mail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide . Also Contained : Nil Action Taken : Goods referred to HGI QLD for further action . Attachments : Nil 34 FEB 2004'
有时候只需指出正确的方向。