处理NetCDF文件时出现性能问题

时间:2019-06-15 18:36:21

标签: netcdf netcdf4 nco

我正在为项目使用gridMET(http://www.climatologylab.org/gridmet.html)和MACA(http://thredds.northwestknowledge.net:8080/thredds/reacch_climate_CMIP5_macav2_catalog2.html)NetCDF文件,并且遇到性能问题。在gridMET NetCDF文件(持续时间:1979-2015)上实现一个简单功能的时间约为0.01sec / grid cell。但是,具有与gridMET数据相同功能的MACA NetCDF文件(持续时间:2016-2050)的处理时间约为0.3sec / grid cell。这两个数据集在大面积上的处理时间完全不同。

gridMET文件的头信息为:

netcdf pr_1980 {
dimensions:
    lon = 1386 ;
    lat = 585 ;
    day = 366 ;
    crs = 1 ;
variables:
    double lon(lon) ;
            lon:units = "degrees_east" ;
            lon:description = "longitude" ;
            lon:axis = "X" ;
            lon:standard_name = "longitude" ;
            lon:long_name = "latitude" ;
    double lat(lat) ;
            lat:units = "degrees_north" ;
            lat:description = "latitude" ;
            lat:axis = "Y" ;
            lat:standard_name = "latitude" ;
            lat:long_name = "latitude" ;
    float day(day) ;
            day:units = "days since 1900-01-01 00:00:00" ;
            day:calendar = "gregorian" ;
            day:description = "days since 1900-01-01" ;
            day:standard_name = "time" ;
            day:long_name = "time" ;
    float precipitation_amount(day, lat, lon) ;
            precipitation_amount:units = "mm" ;
            precipitation_amount:description = "Daily Accumulated Precipitation" ;
            precipitation_amount:_FillValue = -32767.f ;
            precipitation_amount:coordinates = "lon lat" ;
            precipitation_amount:cell_methods = "time: sum(interval: 24 hours)" ;
            precipitation_amount:missing_value = -32767. ;
            precipitation_amount:grid_mapping = "crs" ;
    int crs(crs) ;
            crs:grid_mapping_name = "latitude_longitude" ;
            crs:longitude_of_prime_meridian = 0. ;
            crs:semi_major_axis = 6378137. ;
            crs:inverse_flattening = 298.257223563 ;
            crs:spatial_ref = "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]]" ;
            crs:long_name = "WGS 84" ;

// global attributes:
            :author = "John Abatzoglou - University of Idaho, jabatzoglou@uidaho.edu" ;
            :datee = "02 December 2017" ;
            :note1 = "The projection information for this file is: GCS WGS 1984." ;
            :note2 = "Citation: Abatzoglou, J.T., 2013, Development of gridded surface meteorological data for ecological applications and modeling, International Journal of Climatology, DOI: 10.1002/joc.3413" ;
            :last_permanent_slice = "306" ;
            :last_provisional_slice = "360" ;
            :note3 = "Data in slices after last_permanent_slice (1-based) are considered provisional and subject to change with subsequent updates" ;
            :note4 = "Data in slices after last_provisional_slice (1-based) are considered early and subject to change with subsequent updates" ;
            :note5 = "Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)" ;
            :geospatial_bounds_crs = "EPSG:4326" ;
            :Conventions = "CF-1.6" ;
            :geospatial_bounds = "POLYGON((-124.7666666333333 49.400000000000000, -124.7666666333333 25.066666666666666, -67.058333300000015 25.066666666666666, -67.058333300000015 49.400000000000000, -124.7666666333333 49.400000000000000))" ;
            :geospatial_lat_min = "25.066666666666666" ;
            :geospatial_lat_max = "49.40000000000000" ;
            :geospatial_lon_min = "-124.7666666333333" ;
            :geospatial_lon_max = "-67.058333300000015" ;
            :geospatial_lon_resolution = "0.041666666666666" ;
            :geospatial_lat_resolution = "0.041666666666666" ;
            :geospatial_lat_units = "decimal_degrees north" ;
            :geospatial_lon_units = "decimal_degrees east" ;
            :coordinate_system = "EPSG:4326" ;
            :_Format = "classic" ;
}

MACA文件的头信息为:

netcdf pr_CanESM2_macav2_2016 {
dimensions:
    crs = 1 ;
    lat = 585 ;
    lon = 1386 ;
    time = 366 ;
variables:
    int crs(crs) ;
            crs:grid_mapping_name = "latitude_longitude" ;
            crs:longitude_of_prime_meridian = 0. ;
            crs:semi_major_axis = 6378137. ;
            crs:inverse_flattening = 298.257223563 ;
    double lat(lat) ;
            lat:long_name = "latitude" ;
            lat:standard_name = "latitude" ;
            lat:units = "degrees_north" ;
            lat:axis = "Y" ;
            lat:description = "Latitude of the center of the grid cell" ;
    double lon(lon) ;
            lon:long_name = "longitude" ;
            lon:standard_name = "longitude" ;
            lon:units = "degrees_east" ;
            lon:axis = "X" ;
            lon:description = "Longitude of the center of the grid cell" ;
    float precipitation(time, lat, lon) ;
            precipitation:_FillValue = -9999.f ;
            precipitation:long_name = "Precipitation" ;
            precipitation:units = "mm" ;
            precipitation:grid_mapping = "crs" ;
            precipitation:standard_name = "precipitation" ;
            precipitation:cell_methods = "time: sum(interval: 24 hours)" ;
            precipitation:comments = "Total daily precipitation at surface; includes both liquid and solid phases from all types of clouds (both large-scale and convective)" ;
            precipitation:coordinates = "time lon lat" ;
    float time(time) ;
            time:units = "days since 1900-01-01 00:00:00" ;
            time:calendar = "gregorian" ;
            time:description = "days since 1900-01-01" ;

// global attributes:
            :description = "Multivariate Adaptive Constructed Analogs (MACA) method, version 2.3,Dec 2013." ;
            :id = "MACAv2-METDATA" ;
            :naming_authority = "edu.uidaho.reacch" ;
            :Metadata_Conventions = "Unidata Dataset Discovery v1.0" ;
            :Metadata_Link = "" ;
            :cdm_data_type = "GRID" ;
            :title = "Downscaled daily meteorological data of Precipitation from Canadian Centre for Climate Modelling and Analysis (CanESM2) using the run r1i1p1 of the rcp85 scenario." ;
            :summary = "This archive contains daily downscaled meteorological and hydrological projections for the Conterminous United States at 1/24-deg resolution utilizing the Multivariate Adaptive Constructed Analogs (MACA, Abatzoglou, 2012) statistical downscaling method with the METDATA (Abatzoglou,2013) training dataset. The downscaled meteorological variables are maximum/minimum temperature(tasmax/tasmin), maximum/minimum relative humidity (rhsmax/rhsmin)precipitation amount(pr), downward shortwave solar radiation(rsds), eastward wind(uas), northward wind(vas), and specific humidity(huss). The downscaling is based on the 365-day model outputs from different global climate models (GCMs) from Phase 5 of the Coupled Model Inter-comparison Project (CMIP3) utlizing the historical (1950-2005) and future RCP4.5/8.5(2006-2099) scenarios. Leap days have been added to the dataset from the average values between Feb 28 and Mar 1 in order to aid modellers." ;
            :keywords = "daily precipitation, daily maximum temperature, daily minimum temperature, daily downward shortwave solar radiation, daily specific humidity, daily wind velocity, CMIP5, Gridded Meteorological Data" ;
            :keywords_vocabulary = "" ;
            :standard_name_vocabulary = "CF-1.0" ;
            :history = "Sat Jun 15 16:07:12 2019: C:\\nco\\ncks.exe -3 -d time,0,365,1 macav2metdata_pr_CanESM2_r1i1p1_rcp85_2016_2020_CONUS_daily.nc pr_CanESM2_macav2_2016.nc\n",
                    "No revisions." ;
            :comment = "Total daily precipitation at surface; includes both liquid and solid phases from all types of clouds (both large-scale and convective)" ;
            :geospatial_bounds = "POLYGON((-124.7722 25.0631,-124.7722 49.3960, -67.0648 49.3960,-67.0648, 25.0631, -124.7722,25.0631))" ;
            :geospatial_lat_min = "25.0631" ;
            :geospatial_lat_max = "49.3960" ;
            :geospatial_lon_min = "-124.7722" ;
            :geospatial_lon_max = "-67.0648" ;
            :geospatial_lat_units = "decimal degrees north" ;
            :geospatial_lon_units = "decimal degrees east" ;
            :geospatial_lat_resolution = "0.0417" ;
            :geospatial_lon_resolution = "0.0417" ;
            :geospatial_vertical_min = 0. ;
            :geospatial_vertical_max = 0. ;
            :geospatial_vertical_resolution = 0. ;
            :geospatial_vertical_positive = "up" ;
            :time_coverage_start = "2016-01-01T00:0" ;
            :time_coverage_end = "2020-12-31T00:00" ;
            :time_coverage_duration = "P5Y" ;
            :time_coverage_resolution = "P1D" ;
            :date_created = "2014-05-15" ;
            :date_modified = "2014-05-15" ;
            :date_issued = "2014-05-15" ;
            :creator_name = "John Abatzoglou" ;
            :creator_url = "http://maca.northwestknowledge.net" ;
            :creator_email = "jabatzoglou@uidaho.edu" ;
            :institution = "University of Idaho" ;
            :processing_level = "GRID" ;
            :project = "" ;
            :contributor_name = "Katherine C. Hegewisch" ;
            :contributor_role = "Postdoctoral Fellow" ;
            :publisher_name = "" ;
            :publisher_email = "" ;
            :publisher_url = "" ;
            :license = "Creative Commons CC0 1.0 Universal Dedication(http://creativecommons.org/publicdomain/zero/1.0/legalcode)" ;
            :coordinate_system = "WGS84,EPSG:4326" ;
            :NCO = "netCDF Operators version 4.8.1-alpha03 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
            :_Format = "classic" ;
}

gridMET文件具有“经典”格式,而MACA文件具有NetCDF4格式。使用以下命令将MACA文件格式更改为“经典”:

ncks -3 in.nc out.nc

仍然导致2016-2050年每网格电池处理时间为0.3秒。 这是我用来读取和处理NetCDF文件的代码:

ds = xr.open_mfdataset('D:/proj1/*.nc', concat_dim='time')
da = ds.var.sel(lat=273.15, lat=49.4, method='nearest')
da_con = da[(da > 35.5)]

请建议对NetCDF文件进行任何修改,以减少处理开销。

1 个答案:

答案 0 :(得分:2)

有趣的是,尺寸的重新排序将处理时间减少到0.05sec /网格单元。我使用以下命令行操作来重新排列尺寸:

ncpdq -a lon,lat,time in.nc out.nc

可能还有其他解决方案,但这暂时有效。