我正在尝试读取看起来像这样的排放数据文件:
Intent i = new Intent(Intent.ACTION_OPEN_DOCUMENT);
i.addCategory(Intent.CATEGORY_OPENABLE);
i.setType("*/*");
i.putExtra(Intent.EXTRA_MIME_TYPES, new String[]{"image/*", "video/*"});
startActivityForResult(i, requestCode);
我的问题是,当使用空白作为分隔符时,由于2月没有第29天,所以确实在第29天超过了3月的值。对于其他具有空/无值的地方,同样如此。
是否有解决此问题的好方法?
我一直在网上寻找解决方案,但我所能找到的只是处理行长不均,而不是列长不均。
到目前为止,我的尝试导致了代码:
Station number: 420
Location: Kotagaon Shringe
Latitude: 27 45 00
River: Kali Gandaki
Longitude: 84 20 50
Year: 2001
Mean daily discharge in m3/s
============================
Day Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Year
01 118 99.3 85.9 75.5 119 182 656 2790 1690 402 232 158
02 123 97.4 82.9 74.3 134 251 514 2420 2180 397 230 158
03 118 95.5 80.7 73.1 168 377 466 2190 2190 386 226 157
-------------------------------- Skipping some rows of no real interest
25 95.5 85.5 70.7 83.3 163 583 898 3230 485 257 177 123
26 94.1 88.6 69.9 84.6 167 579 996 2330 474 252 175 121
27 92.2 88.6 71.9 88.1 166 736 1180 2270 461 248 173 120
28 91.8 87.3 69.9 91.3 172 419 1020 2270 431 246 168 118
29 95.5 71.9 93.2 165 446 1670 2140 410 244 163 118
30 98.4 76.0 109 176 575 2040 2100 403 239 159 117
31 98.4 75.1 174 3330 1600 234 117
数据框如下所示:
disc = pd.read_csv(filename,header = 6,sep = '\s+',nrows = 31)
disc['Year'] = 2001
答案 0 :(得分:1)
您可以使用pd.read_fwf()
模块读取固定宽度的文件并利用skiprows
关键字:
disc = pd.read_fwf('test.csv', skiprows=11)
收益:
Day Jan. Feb. Mar. Apr. ... Sep. Oct. Nov. Dec. Year
0 1 118.0 99.3 85.9 75.5 ... 1690.0 402 232.0 158 NaN
1 2 123.0 97.4 82.9 74.3 ... 2180.0 397 230.0 158 NaN
2 3 118.0 95.5 80.7 73.1 ... 2190.0 386 226.0 157 NaN
3 25 95.5 85.5 70.7 83.3 ... 485.0 257 177.0 123 NaN
4 26 94.1 88.6 69.9 84.6 ... 474.0 252 175.0 121 NaN
5 27 92.2 88.6 71.9 88.1 ... 461.0 248 173.0 120 NaN
6 28 91.8 87.3 69.9 91.3 ... 431.0 246 168.0 118 NaN
7 29 95.5 NaN 71.9 93.2 ... 410.0 244 163.0 118 NaN
8 30 98.4 NaN 76.0 109.0 ... 403.0 239 159.0 117 NaN
9 31 98.4 NaN 75.1 NaN ... NaN 234 NaN 117 NaN