我有一些以固定宽度文本文件格式存储的数据,我可以将其读入熊猫 - 但是,我需要能够以相同的方式保存它。我几乎可以使用numpy.savetxt()保存它,但我无法将格式字符串保留为左键填充零,从而保持正确的列宽。我将通常定义问题,因为我不介意看看大熊猫中是否存在其他解决方案。以下是数据的样子:
19570010008.3980008.3150004.8380003.8390003.8470002.7150007.313
19570020008.7610008.8500009.0170009.1870009.3030009.4630004.479
19570030008.7090008.2880008.2660008.6920005.9340006.3410002.832
19570040008.5750008.9160009.2570009.7800009.9960010.4320009.518
19570050009.2030008.9530008.7690009.3770009.9450009.5650009.554
19570060009.5840008.9930009.4220010.0380009.8050010.4230009.965
19570070009.2030009.1210009.3770009.4600010.0290010.2850009.726
19570080002.6520002.5970002.6850003.9650002.7860002.8100003.657
19570090009.3830009.2140007.6890007.0390005.8230005.1310002.922
19570100008.0510008.6540009.1620008.4300008.9810009.0460005.027
19570110008.6200007.9140005.8870006.4840008.0130006.1190009.438
19570120009.5460009.3730009.3560009.7090009.4510009.1450008.531
19570130008.3750008.6330006.2340006.4720006.5210004.9730003.002
19570140005.2490004.5890002.8050002.8340002.9050002.9300003.024
19570150008.5760009.6430009.6230010.2590010.3760010.9220010.722
19570160009.8880009.6180009.7790009.8600010.6320006.6980011.374
19570170010.1370009.7760007.0580009.8330010.0330010.8690010.364
19570180010.3010009.9380010.1940010.8420010.6760010.9410011.221
以下是我将其读入数据框的方式:
#Define function to parse the dates
parse = lambda x: pd.Timestamp(datetime(int(x[0:4]), 1, 1) + timedelta(int(x[4:7]) - 1))
#Get the overall width
with open("file.txt") as f:
L = len(f.readline())
#Define column specifications
specs = [(0,7)] + [(7+5*i, 11+5*i) for i in xrange((L-8)/5)]
#Load in the data
df = pd.read_fwf("file.txt", specs, index_col=0, header=[0,1,2], parse_dates=True, date_parser=parse)
我得到一个看起来像这样的框架:
In [62]:
df
Out[62]:
1 2 3 4 5 6 7
0
1957-01-01 8.398 8.315 4.838 3.839 3.847 2.715 7.313
1957-01-02 8.761 8.850 9.017 9.187 9.303 9.463 4.479
1957-01-03 8.709 8.288 8.266 8.692 5.934 6.341 2.832
1957-01-04 8.575 8.916 9.257 9.780 9.996 10.432 9.518
1957-01-05 9.203 8.953 8.769 9.377 9.945 9.565 9.554
1957-01-06 9.584 8.993 9.422 10.038 9.805 10.423 9.965
1957-01-07 9.203 9.121 9.377 9.460 10.029 10.285 9.726
1957-01-08 2.652 2.597 2.685 3.965 2.786 2.810 3.657
1957-01-09 9.383 9.214 7.689 7.039 5.823 5.131 2.922
1957-01-10 8.051 8.654 9.162 8.430 8.981 9.046 5.027
1957-01-11 8.620 7.914 5.887 6.484 8.013 6.119 9.438
1957-01-12 9.546 9.373 9.356 9.709 9.451 9.145 8.531
1957-01-13 8.375 8.633 6.234 6.472 6.521 4.973 3.002
1957-01-14 5.249 4.589 2.805 2.834 2.905 2.930 3.024
1957-01-15 8.576 9.643 9.623 10.259 10.376 10.922 10.722
1957-01-16 9.888 9.618 9.779 9.860 10.632 6.698 11.374
1957-01-17 10.137 9.776 7.058 9.833 10.033 10.869 10.364
1957-01-18 10.301 9.938 10.194 10.842 10.676 10.941 11.221
1957-01-19 6.731 10.010 6.034 9.781 10.556 10.336 10.798
1957-01-20 8.070 10.178 10.435 10.710 11.310 10.799 11.170
1957-01-21 10.720 10.256 10.513 10.788 11.195 11.465 11.750
1957-01-22 10.990 10.336 10.688 10.676 11.276 11.251 11.022
1957-01-23 10.890 10.418 10.577 11.729 11.261 11.532 11.712
这很好,但我需要能够以我得到的相同形式保存它,即。每行的位置需要与右边的零填充相同才能这样做。有没有一种简单的方法可以从熊猫或numpy中做到这一点?
以下是我尝试使用numpy.savetxt():
#Convert first column back to way it was found using index
df.index = [int(str(d.year) + str(d.dayofyear).zfill(3)) for d in df.index]
df = df.reset_index()
#List if format strings for each column
formats = ['%i'] + ['%04.3f' for i in xrange((L-8)/8)]
#Save using empty string as delimiter
np.savetxt("testing.txt", df.values, fmt=formats, delimiter='')
此尝试的输出是这样的:
19570018.3988.3154.8383.8393.8472.7157.313
19570028.7618.8509.0179.1879.3039.4634.479
19570038.7098.2888.2668.6925.9346.3412.832
19570048.5758.9169.2579.7809.99610.4329.518
19570059.2038.9538.7699.3779.9459.5659.554
19570069.5848.9939.42210.0389.80510.4239.965
19570079.2039.1219.3779.46010.02910.2859.726
19570082.6522.5972.6853.9652.7862.8103.657
19570099.3839.2147.6897.0395.8235.1312.922
19570108.0518.6549.1628.4308.9819.0465.027
19570118.6207.9145.8876.4848.0136.1199.438
19570129.5469.3739.3569.7099.4519.1458.531
19570138.3758.6336.2346.4726.5214.9733.002
19570145.2494.5892.8052.8342.9052.9303.024
19570158.5769.6439.62310.25910.37610.92210.722
19570169.8889.6189.7799.86010.6326.69811.374
因为我提到左边的零填充没有出现,虽然我以为我在格式字符串中指定了它。