如何在python中将文本文件的字符串拆分为char和number

时间:2014-04-10 05:19:30

标签: python

我的文件包含这样的字符串:

N1109 X62.729 Y23.764 Z231.442 A59.756 B9.231

所以我想从这个文件中拆分char和整数。输出应该像我这样:

N 1109  X 62.729 Y 23.764 Z 231.442  A 59.756 B 9.231

这是一个文本文件。我不知道如何从文本文件中执行此操作。

我为此写的代码是:

import re
from sys import argv
script, filename = argv
f = open(filename,"r")
lines = f.readlines()
print lines
r = re.compile("([a-zA-Z]+)([0-9]+)")
a = [r.match(string).group() for string in lines]
print a

当我使用group()时出现此错误:

`AttributeError: 'NoneType' object has no attribute 'group'`

当我删除group()时输出为:

[<_sre.SRE_Match object at 0xb72f1b18>, None, None, None, None, None, None, None, None, None, None]

请帮帮我,我是python的新手......

2 个答案:

答案 0 :(得分:0)

问题是match只会从头开始搜索,然后停止:

  

如果字符串开头的零个或多个字符匹配   正则表达式模式,返回相应的MatchObject   实例

您需要使用findall

>>> i
'N1109   X62.729   Y23.764   Z231.442   A59.756   B9.231'
>>> re.findall(r'(\w{1})(\d+\.?\d+)', i)
[('N', '1109'), ('X', '62.729'), ('Y', '23.764'), ('Z', '231.442'), ('A', '59.756'), ('B', '9.231')]

另外,请考虑使用with语句,该语句将为您处理文件的关闭:

import re
import sys

exp = r'(\w{1})(\d+\.?\d+)'

with open(sys.argv[1]) as f:
    for line in f:
        for letter,number in re.findall(exp, line):
            print('{} {}'.format(letter, number))

此外,您的原始表达式"([a-zA-Z]+)([0-9]+)"没有考虑数字的可选.部分 - 您的表达式是&#34;一个或多个字母字符,无论情况如何接着是一个或多个数字&#34; ,你需要的表达式是&#34;一个或多个字母字符,后跟一个或多个数字,一个可选的.,然后一个或更多数字&#34;

答案 1 :(得分:0)

您可以使用re模块来实现此目的。

试试这个,这可能会对你有帮助。

import re
>>> match = re.match(r"([a-z]+)([0-9]+)", 'N1109', re.I)
>>> if match:
        print match.groups()

Output:

('N', '1109')

<强>更新

>>> a=['N1109', 'X62.729', 'Y23.764', 'Z231.442', 'A59.756', 'B9.231']
>>> answer=[]
>>> for i in a:
        match = re.match(r"([a-z]+)([0-9]*\.?[0-9]+)", i, re.I)
            if match:
                   answer.append(match.groups())


>>> answer
[('N', '1109'), ('X', '62.729'), ('Y', '23.764'), ('Z', '231.442'), ('A', '59.756'), ('B', '9.231')]
>>> 

更新

>>> with open(r'd:\test1.txt') as f:
         content = f.readlines()       
>>> content=' '.join(content)
>>> content=content.split()
>>> answer=[]
>>> for i in content:
         match = re.match(r"([a-z]+)([0-9]*\.?[0-9]+)", i, re.I)
                if match:
                    answer.append(match.groups())


>>> answer
[('N', '1100'), ('X', '63.658'), ('Y', '21.066'), ('Z', '230.989'), ('A', '60.28'), ('B', '9.5'), ('N', '1101'), ('X', '63.424'), ('Y', '21.419'), ('Z', '231.06'), ('A', '60.269'), ('B', '9.459'), ('N', '1102'), ('X', '63.219'), ('Y', '21.805'), ('Z', '231.132'), ('A', '60.231'), ('B', '9.418'), ('N', '1103'), ('X', '63.051'), ('Y', '22.206'), ('Z', '231.202'), ('A', '60.169'), ('B', '9.377'), ('N', '1104'), ('X', '62.915'), ('Y', '22.63'), ('Z', '231.272'), ('A', '60.083'), ('B', '9.335'), ('N', '1105'), ('X', '62.863'), ('Y', '22.851'), ('Z', '231.307'), ('A', '60.027'), ('B', '9.314'), ('N', '1106'), ('X', '62.811'), ('Y', '23.073'), ('Z', '231.341'), ('A', '59.971'), ('B', '9.293'), ('N', '1111'), ('X', '62.702'), ('Y', '24.227'), ('Z', '231.506'), ('A', '59.596'), ('B', '9.191'), ('N', '1112'), ('X', '62.71'), ('Y', '24.462'), ('Z', '231.536'), ('A', '59.503'), ('B', '9.172'), ('N', '1113'), ('X', '62.718'), ('Y', '24.697'), ('Z', '231.567'), ('A', '59.41'), ('B', '9.152'), ('N', '1114'), ('X', '62.727'), ('Y', '24.932'), ('Z', '231.597'), ('A', '59.316'), ('B', '9.133'), ('N', '1115'), ('X', '62.734'), ('Y', '25.167'), ('Z', '231.627'), ('A', '59.222'), ('B', '9.114'), ('N', '1123'), ('X', '62.793'), ('Y', '27.037'), ('Z', '231.864'), ('A', '58.46'), ('B', '8.961')]
>>>