从pandas dataframe保存固定宽度的文本文件

时间:2015-01-16 16:25:57

标签: python numpy pandas text-files fixed-width

我有一些以固定宽度文本文件格式存储的数据,我可以将其读入熊猫 - 但是,我需要能够以相同的方式保存它。我几乎可以使用numpy.savetxt()保存它,但我无法将格式字符串保留为左键填充零,从而保持正确的列宽。我将通常定义问题,因为我不介意看看大熊猫中是否存在其他解决方案。以下是数据的样子:

19570010008.3980008.3150004.8380003.8390003.8470002.7150007.313
19570020008.7610008.8500009.0170009.1870009.3030009.4630004.479
19570030008.7090008.2880008.2660008.6920005.9340006.3410002.832
19570040008.5750008.9160009.2570009.7800009.9960010.4320009.518
19570050009.2030008.9530008.7690009.3770009.9450009.5650009.554
19570060009.5840008.9930009.4220010.0380009.8050010.4230009.965
19570070009.2030009.1210009.3770009.4600010.0290010.2850009.726
19570080002.6520002.5970002.6850003.9650002.7860002.8100003.657
19570090009.3830009.2140007.6890007.0390005.8230005.1310002.922
19570100008.0510008.6540009.1620008.4300008.9810009.0460005.027
19570110008.6200007.9140005.8870006.4840008.0130006.1190009.438
19570120009.5460009.3730009.3560009.7090009.4510009.1450008.531
19570130008.3750008.6330006.2340006.4720006.5210004.9730003.002
19570140005.2490004.5890002.8050002.8340002.9050002.9300003.024
19570150008.5760009.6430009.6230010.2590010.3760010.9220010.722
19570160009.8880009.6180009.7790009.8600010.6320006.6980011.374
19570170010.1370009.7760007.0580009.8330010.0330010.8690010.364
19570180010.3010009.9380010.1940010.8420010.6760010.9410011.221

以下是我将其读入数据框的方式:

#Define function to parse the dates
parse = lambda x: pd.Timestamp(datetime(int(x[0:4]), 1, 1) + timedelta(int(x[4:7]) - 1))

#Get the overall width
with open("file.txt") as f:
    L = len(f.readline())

#Define column specifications
specs = [(0,7)] + [(7+5*i, 11+5*i) for i in xrange((L-8)/5)]

#Load in the data
df = pd.read_fwf("file.txt", specs, index_col=0, header=[0,1,2], parse_dates=True, date_parser=parse)

我得到一个看起来像这样的框架:

    In [62]:

df

Out[62]:
    1   2   3   4   5   6   7
0                           
1957-01-01  8.398   8.315   4.838   3.839   3.847   2.715   7.313
1957-01-02  8.761   8.850   9.017   9.187   9.303   9.463   4.479
1957-01-03  8.709   8.288   8.266   8.692   5.934   6.341   2.832
1957-01-04  8.575   8.916   9.257   9.780   9.996   10.432  9.518
1957-01-05  9.203   8.953   8.769   9.377   9.945   9.565   9.554
1957-01-06  9.584   8.993   9.422   10.038  9.805   10.423  9.965
1957-01-07  9.203   9.121   9.377   9.460   10.029  10.285  9.726
1957-01-08  2.652   2.597   2.685   3.965   2.786   2.810   3.657
1957-01-09  9.383   9.214   7.689   7.039   5.823   5.131   2.922
1957-01-10  8.051   8.654   9.162   8.430   8.981   9.046   5.027
1957-01-11  8.620   7.914   5.887   6.484   8.013   6.119   9.438
1957-01-12  9.546   9.373   9.356   9.709   9.451   9.145   8.531
1957-01-13  8.375   8.633   6.234   6.472   6.521   4.973   3.002
1957-01-14  5.249   4.589   2.805   2.834   2.905   2.930   3.024
1957-01-15  8.576   9.643   9.623   10.259  10.376  10.922  10.722
1957-01-16  9.888   9.618   9.779   9.860   10.632  6.698   11.374
1957-01-17  10.137  9.776   7.058   9.833   10.033  10.869  10.364
1957-01-18  10.301  9.938   10.194  10.842  10.676  10.941  11.221
1957-01-19  6.731   10.010  6.034   9.781   10.556  10.336  10.798
1957-01-20  8.070   10.178  10.435  10.710  11.310  10.799  11.170
1957-01-21  10.720  10.256  10.513  10.788  11.195  11.465  11.750
1957-01-22  10.990  10.336  10.688  10.676  11.276  11.251  11.022
1957-01-23  10.890  10.418  10.577  11.729  11.261  11.532  11.712

这很好,但我需要能够以我得到的相同形式保存它,即。每行的位置需要与右边的零填充相同才能这样做。有没有一种简单的方法可以从熊猫或numpy中做到这一点?

以下是我尝试使用numpy.savetxt():

#Convert first column back to way it was found using index
df.index = [int(str(d.year) + str(d.dayofyear).zfill(3)) for d in df.index]
df = df.reset_index()

#List if format strings for each column
formats = ['%i'] + ['%04.3f' for i in xrange((L-8)/8)]
#Save using empty string as delimiter
np.savetxt("testing.txt", df.values, fmt=formats, delimiter='')

此尝试的输出是这样的:

19570018.3988.3154.8383.8393.8472.7157.313
19570028.7618.8509.0179.1879.3039.4634.479
19570038.7098.2888.2668.6925.9346.3412.832
19570048.5758.9169.2579.7809.99610.4329.518
19570059.2038.9538.7699.3779.9459.5659.554
19570069.5848.9939.42210.0389.80510.4239.965
19570079.2039.1219.3779.46010.02910.2859.726
19570082.6522.5972.6853.9652.7862.8103.657
19570099.3839.2147.6897.0395.8235.1312.922
19570108.0518.6549.1628.4308.9819.0465.027
19570118.6207.9145.8876.4848.0136.1199.438
19570129.5469.3739.3569.7099.4519.1458.531
19570138.3758.6336.2346.4726.5214.9733.002
19570145.2494.5892.8052.8342.9052.9303.024
19570158.5769.6439.62310.25910.37610.92210.722
19570169.8889.6189.7799.86010.6326.69811.374

因为我提到左边的零填充没有出现,虽然我以为我在格式字符串中指定了它。

0 个答案:

没有答案