使用python将字符串中的连接单词分开

时间:2016-10-03 23:23:53

标签: python

"10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"

我想要做的是在python中读取这个字符串并分开连接的单词。我真正想要的是一个正则表达式来分隔字符串中的连接词。

我想从文件中读取上面的字符串,输出应该如下:

"10 JAN 2015 AirMail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide. Also calculated: Nil Action Taken: Goods referred to USG for further action. Attachments : Nil 60 FEB 2004." 

(将加入的单词分开)

我需要编写一个正则表达式来分隔:

'10Jan2015AirMail', 'HyderabadAddress', 'details:John', 'DriveAdelaide'

需要一个正则表达式来识别上面连接的单词,并用相同字符串中的空格分隔它们,如

'10 Jan 2015 AirMail, 'Hyderabad Address', 'details : John'

text = open('C:\sample.txt', 'r').read().replace("\n","").replace("\t","").replace("-",""‌​).replace("/"," ")

newtext = re.sub('[a-zA-Z0-9_:]','',text) #This regex does not work.Please assist

print text
print newtext

以上代码不起作用

1 个答案:

答案 0 :(得分:0)

我知道这个解决方案可以在集合中更简单地对字符进行分类(上限,下限,数字),但我更喜欢做一个更冗长的解决方案:

test_text = "10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"
splitted_text = test_text.split(' ')
num = False
low = False
upp = False
result = []

for word in ss:
  new_word = ''
  if not word.isupper() and not word.islower():
    if word[0].isnumeric():
        num = True
        low = False
        upp = False
    elif word[0].islower():
        num = False
        low = True
        upp = False
    elif word[0].isupper():
        num = False
        low = False
        upp = True
    for letter in word:
      if letter.isnumeric():
        if num:
            new_word += letter
        else:
            new_word += ' ' + letter
        low = False
        upp = False
        num = True
      elif letter.islower():
        if low or upp:
            new_word += letter
        else:
            new_word += ' ' + letter
        low = True
        upp = False
        num = False
      elif letter.isupper():
        if low or num:
            new_word += ' ' + letter
        else:
            new_word += letter
        low = False
        upp = True
        num = False
      else:
        new_word += ' ' + letter
    result.append(''.join(new_word))
  else:
    result.append(word)
' '.join(result)
#'10 JAN 2015 Air Mail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide . Also Contained : Nil Action Taken : Goods referred to HGI QLD for further action . Attachments : Nil 34 FEB 2004'

有时候只需指出正确的方向。