将用awk编写的文本处理代码转换为python?

时间:2019-06-24 21:08:42

标签: python python-3.x text awk text-processing

以下代码片段将一个文本文件转换为另一个文本文件,但在指定的字段宽度之后添加了分隔符。

gawk 'BEGIN{FIELDWIDTHS="1 26 1 26 26 26 26 18 2 5 4 7 10 16 4 4 10 2 6 1 1 1 1 10 10 4 11 3 1 1 2 10 10 10 1 1 10 20 10 1 1 1 1 15 16 10 50 13 1 60"}{print $1 "|" $2 "|" $3 "|" $4 "|" $5 "|" $6 "|" $7 "|" $8 "|" $9 "|" $10 "|" $11 "|" $12 "|" $13 "|" $14 "|" $15 "|" $16 "|" $17 "|" $18 "|" $19 "|" $20 "|" $21 "|" $22 "|" $23 "|" $24 "|" $25 "|" $26 "|" $27 "|" $28 "|" $29 "|" $30 "|" $31 "|" $32 "|" $33 "|" $34 "|" $35 "|" $36 "|" $37 "|" $38 "|" $39 "|" $40 "|" $41 "|" $42 "|" $43 "|" $44 "|" $45 "|" $46 "|" $47 "|" $48 "|" $49 "|" $50}' /Users/sxd2udz/citi-feed/flat_file_conversion_scripts/inbound/thdct_daily_delta_rpt02_062219.txt1 > delta_flat.txt 关于如何将其转换为python,我有几种想法,但是我需要一个方向。预先感谢!

1 个答案:

答案 0 :(得分:0)

以下功能与gawk代码段的功能相同-

def chunkstring(string, lengths): return (string[pos:pos+length] for idx,length in enumerate(lengths) for pos in [sum(map(int, lengths[:idx]))]) column_lengths = [1,26,1,26,26,26,26,18,2,5,4,7,10,16,4,4,10,2,6,1,1,1,1,10,10,4,11,3,1,1,2,10,10,10,1,1,10,20,10,1,1,1,1,15,16,10,50,13,1,60]