Question

我的文件中的文字如下：

text1 5,000 6,000
text2 2,000 3,000
text3 
           5,000 3,000
text4 1,000 2000
text5
          7,000 1,000
text6 2,000 1,000

有没有办法在Python中清除它，以便在文本行后面有缺少的数字时，后续行上的数字可以放在上面的行上：

text1 5,000 6,000
text2 2,000 3,000
text3 5,000 3,000
text4 1,000 2000
text5 7,000 1,000
text6 2,000 1,000

谢谢！

Answer 1

假设每行应该有三个“单词”，你可以使用

tokens = (x for line in open("file") for x in line.split())
for t in zip(tokens, tokens, tokens):
    print str.join(" ", t)

编辑：由于上述先决条件显然不成立，这是一个实际查看数据的实现：

from itertools import groupby
tokens = (x for line in open("file") for x in line.split())
for key, it in groupby(tokens, lambda x: x[0].isdigit()):
    if key:
        print str.join(" ", it)
    else:
        print str.join("\n", it),

Answer 2

假设逻辑行在以空格开头的行上“继续”（并包含任意数量的记录），您可以使用：

>>> collapse_space = lambda s: str.join(" ", s.split())
>>>
>>> logical_lines = []
>>> for line in open("text"):
...   if line[0].isspace():
...     logical_lines[-1] += line #-- append the continuation to the last logical line
...   else:
...     logical_lines.append(line) #-- start a new logical line
... 
>>> l = map(collapse_space, logical_lines)
>>>
>>> print str.join("\n", l)
text1 5,000 6,000
text2 2,000 3,000
text3 5,000 3,000
text4 1,000 2000
text5 7,000 1,000
text6 2,000 1,000

如何在python中清理文本文件？

2 个答案: