Question

我需要在netcdf文件中处理一个实际包含许多属性和变量的变量。我认为无法更新netcdf文件（请参阅问题How to delete a variable in a Scientific.IO.NetCDF.NetCDFFile?）

我的方法如下：

从原始文件中获取要处理的变量
处理变量
将原始netcdf中的所有数据复制，但将已处理的变量复制到最终文件
将已处理的变量复制到最终文件

我的问题是编码第3步。我开始使用以下内容：

def  processing(infile, variable, outfile):
        data = fileH.variables[variable][:]

        # do processing on data...

        # and now save the result
        fileH = NetCDFFile(infile, mode="r")
        outfile = NetCDFFile(outfile, mode='w')
        # build a list of variables without the processed variable
        listOfVariables = list( itertools.ifilter( lamdba x:x!=variable , fileH.variables.keys() ) )
        for ivar in listOfVariables:
             # here I need to write each variable and each attribute

如何在不需要重建整个数据结构的情况下将所有数据和属性保存在一小撮代码中？

Answer 1

如果您只想复制挑选变量的文件，nccopy是@rewfuss提交的一个很棒的工具。

这是一个使用python-netcdf4的Pythonic（并且更灵活）解决方案。这允许您在写入文件之前将其打开以进行处理和其他计算。

with netCDF4.Dataset(file1) as src, netCDF4.Dataset(file2) as dst:

  for name, dimension in src.dimensions.iteritems():
    dst.createDimension(name, len(dimension) if not dimension.isunlimited() else None)

  for name, variable in src.variables.iteritems():

    # take out the variable you don't want
    if name == 'some_variable': 
      continue

    x = dst.createVariable(name, variable.datatype, variable.dimensions)
    dst.variables[x][:] = src.variables[x][:]

这不考虑变量属性，例如fill_values。您可以轻松地按照文档进行操作。

要小心，netCDF4文件一旦写入/创建这种方式就无法撤消。修改变量的那一刻，它会在with语句结束时写入文件，或者如果您在.close()上调用Dataset。

当然，如果您希望在编写变量之前处理变量，则必须注意要创建的维度。在新文件中，从不写入变量而不创建变量。此外，永远不要创建没有定义维度的变量，如noted in python-netcdf4's documentation。

Answer 2

这是我刚刚使用和工作的内容。 @ arne的答案为Python 3更新，还包括复制变量属性：

import netCDF4 as nc
toexclude = ['ExcludeVar1', 'ExcludeVar2']

with netCDF4.Dataset("in.nc") as src, netCDF4.Dataset("out.nc", "w") as dst:
    # copy global attributes all at once via dictionary
    dst.setncatts(src.__dict__)
    # copy dimensions
    for name, dimension in src.dimensions.items():
        dst.createDimension(
            name, (len(dimension) if not dimension.isunlimited() else None))
    # copy all file data except for the excluded
    for name, variable in src.variables.items():
        if name not in toexclude:
            x = dst.createVariable(name, variable.datatype, variable.dimensions)
            dst[name][:] = src[name][:]
            # copy variable attributes all at once via dictionary
            dst[name].setncatts(src[name].__dict__)

Answer 3

C netCDF版本4.3.0及更高版本中的nccopy实用程序包含一个选项，用于列出要复制的变量（及其属性）。不幸的是，它不包括要排除哪些变量的选项，这是您需要的。

但是，如果要包含的（逗号分隔）变量列表不会导致nccopy命令行超出系统限制，则可以使用。此选项有两种变体：

nccopy -v var1,var2,...,varn input.nc output.nc
nccopy -V var1,var2,...,varn input.nc output.nc

第一个（-v）包含 all 变量定义，但仅包含命名变量的数据。第二个（-V）不包括未命名变量的定义或数据。

Answer 4

这个答案建立在Xavier Ho（https://stackoverflow.com/a/32002401/7666）的答案之上，但我需要修复它：

import netCDF4 as nc
import numpy as np
toexclude = ["TO_REMOVE"]
with nc.Dataset("orig.nc") as src, nc.Dataset("filtered.nc", "w") as dst:
    # copy attributes
    for name in src.ncattrs():
        dst.setncattr(name, src.getncattr(name))
    # copy dimensions
    for name, dimension in src.dimensions.iteritems():
        dst.createDimension(
            name, (len(dimension) if not dimension.isunlimited else None))
    # copy all file data except for the excluded
    for name, variable in src.variables.iteritems():
        if name not in toexclude:
            x = dst.createVariable(name, variable.datatype, variable.dimensions)
            dst.variables[name][:] = src.variables[name][:]

python netcdf：制作所有变量和属性的副本，但只有一个

4 个答案: