列长度​​不均匀的文件中读取的熊猫

时间:2018-10-08 14:31:49

标签: python pandas

我正在尝试读取看起来像这样的排放数据文件:

            Intent i = new Intent(Intent.ACTION_OPEN_DOCUMENT);
            i.addCategory(Intent.CATEGORY_OPENABLE);
            i.setType("*/*");
            i.putExtra(Intent.EXTRA_MIME_TYPES, new String[]{"image/*", "video/*"});
            startActivityForResult(i, requestCode);

我的问题是,当使用空白作为分隔符时,由于2月没有第29天,所以确实在第29天超过了3月的值。对于其他具有空/无值的地方,同样如此。

是否有解决此问题的好方法?

我一直在网上寻找解决方案,但我所能找到的只是处理行长不均,而不是列长不均。

到目前为止,我的尝试导致了代码:

Station number: 420
Location:       Kotagaon Shringe                                                  
Latitude: 27 45 00
River:          Kali Gandaki                                                     
Longitude: 84 20 50

Year:           2001

                                Mean daily discharge in m3/s
                                ============================

Day     Jan.   Feb.   Mar.   Apr.   May    Jun.   Jul.   Aug.   Sep.   Oct.   Nov.   Dec.   Year
 01      118   99.3   85.9   75.5    119    182    656   2790   1690    402    232    158
 02      123   97.4   82.9   74.3    134    251    514   2420   2180    397    230    158
 03      118   95.5   80.7   73.1    168    377    466   2190   2190    386    226    157
-------------------------------- Skipping some rows of no real interest
 25     95.5   85.5   70.7   83.3    163    583    898   3230    485    257    177    123
 26     94.1   88.6   69.9   84.6    167    579    996   2330    474    252    175    121
 27     92.2   88.6   71.9   88.1    166    736   1180   2270    461    248    173    120
 28     91.8   87.3   69.9   91.3    172    419   1020   2270    431    246    168    118
 29     95.5          71.9   93.2    165    446   1670   2140    410    244    163    118
 30     98.4          76.0    109    176    575   2040   2100    403    239    159    117
 31     98.4          75.1           174          3330   1600           234           117

数据框如下所示:

disc = pd.read_csv(filename,header = 6,sep = '\s+',nrows = 31)
disc['Year'] = 2001

1 个答案:

答案 0 :(得分:1)

您可以使用pd.read_fwf()模块读取固定宽度的文件并利用skiprows关键字:

disc = pd.read_fwf('test.csv', skiprows=11)

收益:

   Day   Jan.  Feb.  Mar.   Apr.  ...     Sep.  Oct.   Nov.  Dec.  Year
0    1  118.0  99.3  85.9   75.5  ...   1690.0   402  232.0   158   NaN
1    2  123.0  97.4  82.9   74.3  ...   2180.0   397  230.0   158   NaN
2    3  118.0  95.5  80.7   73.1  ...   2190.0   386  226.0   157   NaN
3   25   95.5  85.5  70.7   83.3  ...    485.0   257  177.0   123   NaN
4   26   94.1  88.6  69.9   84.6  ...    474.0   252  175.0   121   NaN
5   27   92.2  88.6  71.9   88.1  ...    461.0   248  173.0   120   NaN
6   28   91.8  87.3  69.9   91.3  ...    431.0   246  168.0   118   NaN
7   29   95.5   NaN  71.9   93.2  ...    410.0   244  163.0   118   NaN
8   30   98.4   NaN  76.0  109.0  ...    403.0   239  159.0   117   NaN
9   31   98.4   NaN  75.1    NaN  ...      NaN   234    NaN   117   NaN