如何在Python中用0替换文本表中的空白条目?

时间:2011-10-08 22:06:57

标签: python text

我的表格看起来像这样:

text = """
ID = 1234

Hello World              135,343    117,668    81,228
Another line of text    (30,632)              (48,063)
More text                  0         11,205       0    
Even more text                       1,447       681

ID = 18372

Another table                        35,323              38,302      909,381
Another line with text                 13                  15
More text here                                              7           0    
Even more text here                   7,011               1,447        681
"""

有没有办法用0替换每个表中的“空白”条目?我试图在条目之间设置分隔符,但使用以下代码无法处理表中的空白点:

for line in text.splitlines():
    if 'ID' not in line:
        line1 = line.split()
        line = '|'.join((' '.join(line1[:-3]), '|'.join(line1[-3:])))
        print line
    else:
        print line

输出结果为:

ID = 1234
|
Hello World|135,343|117,668|81,228
Another line of|text|(30,632)|(48,063)
More text|0|11,205|0
Even more|text|1,447|681
|
ID = 18372
|
Another table|35,323|38,302|909,381
Another line with|text|13|15
More text|here|7|0
Even more text here|7,011|1,447|681

如您所见,第一个问题出现在第一个表的第二行。 “文本”一词被认为是第一列。有什么办法在Python中修复这个以用0?

替换空白条目

1 个答案:

答案 0 :(得分:1)

这是一个用于查找一堆行中的列的函数。第二个参数pat定义列是什么,可以是任何正则表达式。

import itertools as it
import re

def find_columns(lines, pat = r' '):
    '''
    Usage:
    widths = find_columns(lines)
    for line in lines:
        if not line: continue
        vals = [ line[widths[i]:widths[i+1]].strip() for i in range(len(widths)-1) ]
    '''
    widths = []
    maxlen = max(len(line) for line in lines)
    for line in lines:
        line = ''.join([line, ' '*(maxlen-len(line))])
        candidates = []
        for match in re.finditer(pat, line):
            candidates.extend(range(match.start(), match.end()+1))
        widths.append(set(candidates))
    widths = sorted(set.intersection(*widths))
    diffs = [widths[i+1]-widths[i] for i in range(len(widths)-1)]
    diffs = [None]+diffs
    widths = [w for d, w in zip(diffs, widths) if d != 1]
    if widths[0] != 0: widths = [0]+widths
    return widths

def report(text):
    for key, group in it.groupby(text.splitlines(), lambda line:line.startswith('ID')):
        lines = list(group)
        if key:
            print('\n'.join(lines))
        else:
            # r' (?![a-zA-Z])' defines a column to be any whitespace
            # not followed by alphabetic characters.
            widths = find_columns(lines, pat = r'\s(?![a-zA-Z])')
            for line in lines:
                if not line: continue
                vals = [ line[widths[i]:widths[i+1]] for i in range(len(widths)-1) ]
                vals = [v if v.strip() else v[1:]+'0' for v in vals]
                print('|'.join(vals))

text = """\
ID = 1234

Hello World              135,343    117,668    81,228
Another line of text    (30,632)              (48,063)
More text                  0         11,205       0    
Even more text                       1,447       681

ID = 18372

Another table                        35,323              38,302      909,381
Another line with text                 13                  15
More text here                                              7           0    
Even more text here                   7,011               1,447        681
"""

report(text)

产量

ID = 1234
Hello World         |     135,343|    117,668|    81,228
Another line of text|    (30,632)|          0|   (48,063)
More text           |       0    |     11,205|       0   
Even more text      |           0|     1,447 |      681
ID = 18372
Another table         |               35,323|              38,302|      909,381
Another line with text|                 13  |                15|0
More text here        |                    0|                 7  |         0   
Even more text here   |                7,011|               1,447|        681