Question

为什么我只在后续执行python函数时才会出现此错误？

我正在运行一个python脚本，它将一种netCDF4文件转换为另一种，并通过调用我编写的模块中的函数来完成此操作。

该脚本按顺序处理多个文件。当我到达列表中的第二个文件时，我得到一个＆＃34; IndexError：数据数组的大小不符合切片＆＃34;在＆＃34;数据[＆＃39;时间＆＃39;] [：]＆＃34;在我的函数中的这段代码中：

varobj = cdf.createVariable('time','f8',('time'))
varobj.setncatts(dictifyatts(data['time'],''))
varobj[:] = data['time'][:]

文件是什么并不重要。脚本总是愉快地处理第一个文件，然后在第二个文件上扼流，例如第二次它唤起它失败的功能，第一次就可以了。

使用调试器我发现varobj [：]和数据[＆＃39; time＆＃39;] [：]从第一次调用到第二次调用没有区别。如下：

第二次调用函数时，检查变量会显示：

ipdb> data['time']
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    description: time of measurement
    calendar: gregorian
    units: seconds since 1970-01-01T00:00:00 UTC
path = /Data/Burst
unlimited dimensions: 
current shape = (357060,)
filling off


ipdb> varobj
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    description: time of measurement
    calendar: gregorian
    units: seconds since 1970-01-01T00:00:00 UTC
unlimited dimensions: 
current shape = (357056,)
filling on, default _FillValue of 9.969209968386869e+36 used

第一次调用该函数时，检查变量会显示具有相同大小的形状的完全相同的结果。

此处报告了同样的错误： Error when creating variable to create a netCDF file

基于此，我尝试了以下代码：

cf_time = data['time'][:]
cdf.createVariable('time','f8',('time'))
cdf['time'].setncatts(dictifyatts(data['time'],''))
cdf['time'][:] = cf_time[:]

哪个也没用。在相同情况下也是同样的错误。

我没有想法，可以使用有关下一步检查内容的建议。

感谢巴特暗中侦察形状变化。这是一个很大的线索。我正在检查文件名。

当我调查形状变化时，我发现在我的函数中，其中一个输入变量保存了上次调用函数时的信息。
首先，为什么只有一个输入变量可以保留陈旧信息？二，这根本不应该发生，它应该超出范围。

我将尝试在最小化的代码中重现这种行为，同时，对于python中的范围问题的答案将不胜感激 - 我以为我理解python如何处理范围。

这是用于演示问题的最小代码。不知何故，调用函数可以更改超出范围的变量（good_ens）。

def doFile(infileName, outfileName, goodens, timetype, flen):

    print('infilename = %s' % infileName)
    print('outfilename = %s' % outfileName)
    print('goodens at input are from %d to %d' % (goodens[0],goodens[1]))
    print('timetype is %s' % timetype)

    maxens = flen # fake file length
    print('%s time variable has %d ensembles' % (infileName,maxens))

    # TODO - goodens[1] has the file size from the previous file run when multiple files are processed!
    if goodens[1] < 0:
        goodens[1] = maxens

    print('goodens adjusted for input file length are from %d to %d' % (goodens[0],goodens[1]))

    nens = goodens[1]-goodens[0]
    print('creating new netCDF file %s with %d records (should match input file)' % (outfileName, nens))



datapath = ""

datafiles = ['file0.nc',\
             'file1.nc',\
             'file2.nc',\
             'file3.nc']
# fake file lengths for this demonstration
datalengths = [357056, 357086, 357060, 199866]
outfileroot = 'outfile'
attFile = datapath + 'attfile.txt'
# this gets changed!  It should never be changed!
# ask for all ensembles in the file
good_ens = [0,-1]

 # --------------  beyond here the user should not need to change things
for filenum in range(len(datafiles)):

    print('\n--------------\n')
    print('Input Parameters before function call')
    print(good_ens)
    inputFile = datapath + datafiles[filenum]
    print(inputFile)
    l = datalengths[filenum]
    print(l)
    outputFile = datapath + ('%s%03d.cdf' % (outfileroot,filenum))
    print(outputFile)

    print('Converting from %s to %s' % (inputFile,outputFile))
    # the variable good_ens gets changed by this calling function, and should not be
    doFile(inputFile, outputFile, good_ens, 'CF', l)
    # this works, but will not work for me in using this function
    #doNortekRawFile(inputFile, outputFile, [0,-1], 'CF', l)

Answer 1

我来到这里是因为尝试将大型xarray放入netcdf文件时遇到相同的错误。原来，我不得不将数据集重新分块为统一的块而没有残差。 Dask通过“ compute_chunk_sizes（）”来执行此操作，在xarray中，您可以通过“ arr.chunk（）”来指定块。 xarray documentation .chunk()

Answer 2

所以这里的问题来自一位老C程序员（我）误解了python如何将对象传递给函数。我减少了代码，隔离了问题并在此处发布了问题： python variable contents changed by function when no change is intended 它已被回答：python总是传递指针，不像C，它明确指出是否传递指针或内容。

IndexError：数据数组的大小不符合切片

2 个答案: