我的表格看起来像这样:
text = """
ID = 1234
Hello World 135,343 117,668 81,228
Another line of text (30,632) (48,063)
More text 0 11,205 0
Even more text 1,447 681
ID = 18372
Another table 35,323 38,302 909,381
Another line with text 13 15
More text here 7 0
Even more text here 7,011 1,447 681
"""
有没有办法用0替换每个表中的“空白”条目?我试图在条目之间设置分隔符,但使用以下代码无法处理表中的空白点:
for line in text.splitlines():
if 'ID' not in line:
line1 = line.split()
line = '|'.join((' '.join(line1[:-3]), '|'.join(line1[-3:])))
print line
else:
print line
输出结果为:
ID = 1234
|
Hello World|135,343|117,668|81,228
Another line of|text|(30,632)|(48,063)
More text|0|11,205|0
Even more|text|1,447|681
|
ID = 18372
|
Another table|35,323|38,302|909,381
Another line with|text|13|15
More text|here|7|0
Even more text here|7,011|1,447|681
如您所见,第一个问题出现在第一个表的第二行。 “文本”一词被认为是第一列。有什么办法在Python中修复这个以用0?
替换空白条目答案 0 :(得分:1)
这是一个用于查找一堆行中的列的函数。第二个参数pat
定义列是什么,可以是任何正则表达式。
import itertools as it
import re
def find_columns(lines, pat = r' '):
'''
Usage:
widths = find_columns(lines)
for line in lines:
if not line: continue
vals = [ line[widths[i]:widths[i+1]].strip() for i in range(len(widths)-1) ]
'''
widths = []
maxlen = max(len(line) for line in lines)
for line in lines:
line = ''.join([line, ' '*(maxlen-len(line))])
candidates = []
for match in re.finditer(pat, line):
candidates.extend(range(match.start(), match.end()+1))
widths.append(set(candidates))
widths = sorted(set.intersection(*widths))
diffs = [widths[i+1]-widths[i] for i in range(len(widths)-1)]
diffs = [None]+diffs
widths = [w for d, w in zip(diffs, widths) if d != 1]
if widths[0] != 0: widths = [0]+widths
return widths
def report(text):
for key, group in it.groupby(text.splitlines(), lambda line:line.startswith('ID')):
lines = list(group)
if key:
print('\n'.join(lines))
else:
# r' (?![a-zA-Z])' defines a column to be any whitespace
# not followed by alphabetic characters.
widths = find_columns(lines, pat = r'\s(?![a-zA-Z])')
for line in lines:
if not line: continue
vals = [ line[widths[i]:widths[i+1]] for i in range(len(widths)-1) ]
vals = [v if v.strip() else v[1:]+'0' for v in vals]
print('|'.join(vals))
text = """\
ID = 1234
Hello World 135,343 117,668 81,228
Another line of text (30,632) (48,063)
More text 0 11,205 0
Even more text 1,447 681
ID = 18372
Another table 35,323 38,302 909,381
Another line with text 13 15
More text here 7 0
Even more text here 7,011 1,447 681
"""
report(text)
产量
ID = 1234
Hello World | 135,343| 117,668| 81,228
Another line of text| (30,632)| 0| (48,063)
More text | 0 | 11,205| 0
Even more text | 0| 1,447 | 681
ID = 18372
Another table | 35,323| 38,302| 909,381
Another line with text| 13 | 15|0
More text here | 0| 7 | 0
Even more text here | 7,011| 1,447| 681