我有一个表示数据网格的列表列表(想想电子表格中的行)。每行可以有任意数量的列,每个单元格中的数据是一个任意长度的字符串。
我想将其规范化,实际上,使每行具有相同的列数,并且数据中的每列具有相同的宽度,必要时使用空格填充。例如,给出以下输入:
(
("row a", "a1","a2","a3"),
("another row", "b1"),
("c", "x", "y", "a long string")
)
我希望数据看起来像这样:
(
("row a ", "a1", "a2", "a3 "),
("another row", "b1", " ", " "),
("c ", "x ", "y ", "a long string")
)
python 2.6或更高版本的pythonic解决方案是什么?为了清楚起见:我不打算将列表本身打印出来,我正在寻找一种解决方案,它返回一个新的列表列表(或元组元组),其值为padded出。
答案 0 :(得分:7)
从输入数据开始:
>>> d = (
("row a", "a1","a2","a3"),
("another row", "b1"),
("c", "x", "y", "a long string")
)
进行一次通过以确定每列的最大大小:
>>> col_size = {}
>>> for row in d:
for i, col in enumerate(row):
col_size[i] = max(col_size.get(i, 0), len(col))
>>> ncols = len(col_size)
然后进行第二遍以将每列填充到所需的宽度:
>>> result = []
>>> for row in d:
row = list(row) + [''] * (ncols - len(row))
for i, col in enumerate(row):
row[i] = col.ljust(col_size[i])
result.append(row)
这给出了期望的结果:
>>> from pprint import pprint
>>> pprint(result)
[['row a ', 'a1', 'a2', 'a3 '],
['another row', 'b1', ' ', ' '],
['c ', 'x ', 'y ', 'a long string']]
为方便起见,这些步骤可以合并为一个功能:
def align(array):
col_size = {}
for row in array:
for i, col in enumerate(row):
col_size[i] = max(col_size.get(i, 0), len(col))
ncols = len(col_size)
result = []
for row in array:
row = list(row) + [''] * (ncols - len(row))
for i, col in enumerate(row):
row[i] = col.ljust(col_size[i])
result.append(row)
return result
答案 1 :(得分:6)
以下是我提出的建议:
import itertools
def pad_rows(strs):
for col in itertools.izip_longest(*strs, fillvalue=""):
longest = max(map(len, col))
yield map(lambda x: x.ljust(longest), col)
def pad_strings(strs):
return itertools.izip(*pad_rows(strs))
并且这样称呼它:
print tuple(pad_strings(x))
得出这个结果:
(('row a ', 'a1', 'a2', 'a3 '),
('another row', 'b1', ' ', ' '),
('c ', 'x ', 'y ', 'a long string'))
答案 2 :(得分:2)
首先,定义填充函数:
def padder(lst, pad_by):
lengths = [len(x) for x in lst]
max_len = max(lengths)
return (x + pad_by * (max_len - length) for x, length in zip(lst, lengths))
然后按''
:
a = # your list of list of string
a_padded = padder(a, ('',))
然后,转置此列表列表,以便我们可以逐列工作,
a_tr = zip(*a_padded)
对于每一行,我们找到字符串的最大长度,然后将其填充到指定的长度。
a_tr_strpadded = (padder(x, ' ') for x in a_tr)
最后我们再次转置它,并评估结果。
a_strpadded = zip(*a_tr_strpadded)
return [list(x) for x in a_strpadded]
如果你想要一个元组元组而不是列表列表,请使用tuple(tuple(x) for ...)
。
答案 3 :(得分:1)
import itertools
def fix_grid(grid):
# records the number of cols, and their respective widths
cols = []
for row in grid:
# extend cols with widths of 0 if necessary
cols.extend(itertools.repeat(0, max(0, len(row) - len(cols)))
for index, value in enumerate(row):
# increase any widths in cols if this row has larger entries
cols[index] = max(cols[index], len(value)
# generate new rows with values widened, and fill in values that are missing
for row in grid:
yield tuple(value.ljust(width)
for value, width in itertools.zip_longest(row, cols, ''))
# create a tuple of fixed rows from the old grid
grid = tuple(fix_grid(grid))
请参阅:
答案 4 :(得分:1)
我建议您使用list
代替tuple
。 tuple
是不可变的,难以使用。
首先,找到最长行的长度。
maxlen = max([len(row) for row in yourlist])
然后用必要数量的字符串填充每一行:
for row in yourlist:
row += ['' for i in range(maxlen - len(row))]
然后你可以互换行和列,即列应该是行,反之亦然。为此你可以写
newlist = [[row[i] for row in yourlist] for i in range(len(row))]
现在,您可以取一行(旧列表的列)并根据需要填充字符串。
for row in newlist:
maxlen = max([len(s) for s in row])
for i in range(len(row)):
row[i] += ' ' * (maxlen - len(row[i]))
现在将表格转换回原始格式:
table = [[row[i] for row in newlist] for i in range(len(row))]
将它放在一个函数中:
def f(table):
maxlen = max([len(row) for row in table])
for row in table:
row += ['' for i in range(maxlen - len(row))]
newtable = [[row[i] for row in table] for i in range(len(row))]
for row in newtable:
maxlen = max([len(s) for s in row])
for i in range(len(row)):
row[i] += ' ' * (maxlen - len(row[i]))
return [[row[i] for row in newtable] for i in range(len(row))]
此解决方案适用于list
s。
答案 5 :(得分:0)
我只能通过两次尝试来做到这一点 - 但不应该很难:
def pad_2d_matrix(data):
widths = {}
for line in data:
for index, string in enumerate(line):
widths[index] = max(widths.get(index, 0), len(string))
result = []
max_strings = max(widths.keys())
for line in data:
result.append([])
for index, string in enumerate(line):
result[-1].append(string + " " * (widths[index] - len(string) ))
for index_2 in range(index, max_strings):
result[-1].append(" " * widths[index_2])
return result
答案 6 :(得分:0)
我同意其他人的意见,应该有两次通过。第1遍计算每列的最大宽度,并将每个单元格的2个填充传递到其列宽。
下面的代码依赖于Python内置函数map()
和reduce()
。缺点是表达式可能更加神秘。我试图用很多缩进来抵消它。好处是代码可以从实现在这些函数中进行的任何循环优化中受益。
g = (
("row a", "a1","a2","a3"),
("another row", "b1"),
(), # null row added as a test case
("c", "x", "y", "a long string")
)
widths = reduce(
lambda sofar, row:
map(
lambda longest, cell:
max(longest, 0 if cell is None else len(cell)
),
sofar,
row
),
g,
[]
) #reduce()
print 'widths:', widths
print 'normalised:', tuple([
tuple(map(
lambda cell, width: ('' if cell is None else cell).ljust(width),
row,
widths
)) #tuple(map(
for row in g
]) #tuple([
这会给出输出(添加换行符以表示易读性):
widths: [11, 2, 2, 13]
normalised: (
('row a ', 'a1', 'a2', 'a3 '),
('another row', 'b1', ' ', ' '),
(' ', ' ', ' ', ' '),
('c ', 'x ', 'y ', 'a long string')
)
我已经测试了这段代码。 ... if cell is None else cell
表达式是详细的,但是使表达式实际起作用是必要的。
答案 7 :(得分:-1)
只是为了好玩 - 一个班轮
from itertools import izip_longest as zl
t=(
("row a", "a1","a2","a3"),
("another row", "b1"),
("c", "x", "y", "a long string")
);
b=tuple(tuple(("{: <"+str(map(max, ( map(lambda x: len(x) if x else 0,i) for i in zl(*t) ))[i])+"}").format(j) for i,j in enumerate(list(k)+[""]*(max(map(len,t))-len(k)))) for k in t)
print(b)