Python-Scipy genextreme fit中的意外形状参数行为

时间:2018-09-21 20:23:51

标签: python scipy statistics curve-fitting

我一直在尝试使用Scipy的stats.genextreme函数使GEV分布适合某些年度最大河流流量,但是我发现了这种拟合的一些怪异行为。根据您的数据量(即1e-5与1e-1),返回的shape参数可能会大不相同。例如:

i in {0,1,2}

对于bugArr / bugArr_small和fineArr / fineArr_small估计的GEV参数,我得到以下输出:

import scipy as scipy
import numpy as np
from scipy.stats import genextreme as gev
from scipy.stats import gumbel_r as gumbel

#Set up arrays of values to fit curve to 
sample=np.random.rand(1,30) #Random set of decimal values 
smallVals = sample*1e-5     #Scale to smaller values 

#If the above is not creating different values, this instance of random numbers has:
bugArr = np.array([[0.25322987, 0.81952358, 0.94497455, 0.36295543, 0.72272746, 0.49482558,0.65674877, 0.40876558, 0.64952248, 0.23171052, 0.24645658, 0.35359126,0.27578928, 0.24820775, 0.69789187, 0.98876361, 0.22104156,0.40019593,0.0756707,  0.12342556, 0.3601186,  0.54137089,0.43477705, 0.44622486,0.75483338, 0.69766687, 0.1508741,  0.75428996, 0.93706003, 0.1191987]])
bugArr_small = bugArr*1e-5

#This array of random numbers gives the same shape parameter regardless 
fineArr = np.array([[0.7449611,  0.82376693, 0.32601009, 0.18544293, 0.56779629, 0.30495415,
        0.04670362, 0.88106521, 0.34013959, 0.84598841, 0.24454428, 0.57981437,
        0.57129427, 0.8857514,  0.96254429, 0.64174078, 0.33048637, 0.17124045,
        0.11512589, 0.31884749, 0.48975204, 0.87988863, 0.86898236, 0.83513966,
        0.05858769, 0.25889509, 0.13591874, 0.89106616, 0.66471263, 0.69786708]])
fineArr_small = fineArr*1e-5

#GEV fit for both arrays - shouldn't dramatically change distribution 
gev_fit      = gev.fit(sample)
gevSmall_fit = gev.fit(smallVals)

gevBug      = gev.fit(bugArr)
gevSmallBug = gev.fit(bugArr_small)

gevFine      = gev.fit(fineArr)
gevSmallFine = gev.fit(fineArr_small)

当数据中唯一的差异是缩放比例变化时,为什么形状参数会发生如此大的变化?我希望该行为与FineArr结果一致(形状参数无变化,并且位置和比例参数得到适当缩放)。我已经在Matlab中重复了测试,但是结果与我的预期相符(即,形状参数没有变化)。

1 个答案:

答案 0 :(得分:0)

我想我知道为什么会这样。拟合时可以传递初始形状参数估计值,请参见scipy.stats.rv_continuous.fit的文档,其中指出“任何形状特征参数的起始值(未提供的参数将通过调用_fitstart来确定。 (数据)。没有默认值。”这是一些使用我的pyeq3统计分布拟合器的极其丑陋的,功能强大的代码,该代码在内部尝试使用不同的估计值,对它们进行拟合,然后为不同拟合的最佳nnnf返回参数。此示例代码未显示您观察到的行为,并且给出了相同的形状参数,而与缩放无关。您需要使用“ pip3 install pyeq3”安装pyeq3才能运行此代码。 pyeq3代码是为从zunzun.com上的Web界面输入文本而设计的,因此请紧紧抓住-这是示例代码:

import numpy as np

#Set up arrays of values to fit curve to 
sample=np.random.rand(1,30) #Random set of decimal values 
smallVals = sample*1e-5     #Scale to smaller values 

#If the above is not creating different values, this instance of random numbers has:
bugArr = np.array([0.25322987, 0.81952358, 0.94497455, 0.36295543, 0.72272746, 0.49482558,0.65674877, 0.40876558, 0.64952248, 0.23171052, 0.24645658, 0.35359126,0.27578928, 0.24820775, 0.69789187, 0.98876361, 0.22104156,0.40019593,0.0756707,  0.12342556, 0.3601186,  0.54137089,0.43477705, 0.44622486,0.75483338, 0.69766687, 0.1508741,  0.75428996, 0.93706003, 0.1191987])
bugArr_small = bugArr*1e-5

#This array of random numbers gives the same shape parameter regardless 
fineArr = np.array([0.7449611,  0.82376693, 0.32601009, 0.18544293, 0.56779629, 0.30495415,
        0.04670362, 0.88106521, 0.34013959, 0.84598841, 0.24454428, 0.57981437,
        0.57129427, 0.8857514,  0.96254429, 0.64174078, 0.33048637, 0.17124045,
        0.11512589, 0.31884749, 0.48975204, 0.87988863, 0.86898236, 0.83513966,
        0.05858769, 0.25889509, 0.13591874, 0.89106616, 0.66471263, 0.69786708])
fineArr_small = fineArr*1e-5

bugArr_str = ''
for i in range(len(bugArr)):
    bugArr_str += str(bugArr[i]) + '\n'
bugArr_small_str = ''
for i in range(len(bugArr_small)):
    bugArr_small_str += str(bugArr_small[i]) + '\n'
fineArr_str = ''
for i in range(len(fineArr)):
    fineArr_str += str(fineArr[i]) + '\n'
fineArr_small_str = ''
for i in range(len(fineArr_small)):
    fineArr_small_str += str(fineArr_small[i]) + '\n'
import pyeq3

simpleObject_bugArr = pyeq3.IModel.IModel()
simpleObject_bugArr._dimensionality = 1
pyeq3.dataConvertorService().ConvertAndSortColumnarASCII(bugArr_str, simpleObject_bugArr, False)
solver = pyeq3.solverService()
result_bugArr = solver.SolveStatisticalDistribution('genextreme', simpleObject_bugArr.dataCache.allDataCacheDictionary['IndependentData'][0], 'nnlf')
simpleObject_bugArr_small = pyeq3.IModel.IModel()
simpleObject_bugArr_small._dimensionality = 1
pyeq3.dataConvertorService().ConvertAndSortColumnarASCII(bugArr_small_str, simpleObject_bugArr_small, False)
solver = pyeq3.solverService()
result_bugArr_small = solver.SolveStatisticalDistribution('genextreme', simpleObject_bugArr_small.dataCache.allDataCacheDictionary['IndependentData'][0], 'nnlf')

simpleObject_fineArr = pyeq3.IModel.IModel()
simpleObject_fineArr._dimensionality = 1
pyeq3.dataConvertorService().ConvertAndSortColumnarASCII(fineArr_str, simpleObject_fineArr, False)
solver = pyeq3.solverService()
result_fineArr = solver.SolveStatisticalDistribution('genextreme', simpleObject_fineArr.dataCache.allDataCacheDictionary['IndependentData'][0], 'nnlf')

simpleObject_fineArr_small = pyeq3.IModel.IModel()
simpleObject_fineArr_small._dimensionality = 1
pyeq3.dataConvertorService().ConvertAndSortColumnarASCII(fineArr_small_str, simpleObject_fineArr_small, False)
solver = pyeq3.solverService()
result_fineArr_small = solver.SolveStatisticalDistribution('genextreme', simpleObject_fineArr_small.dataCache.allDataCacheDictionary['IndependentData'][0], 'nnlf')

print('ba',result_bugArr[1]['fittedParameters'])
print('ba_s',result_bugArr_small[1]['fittedParameters'])
print()
print('fa',result_fineArr[1]['fittedParameters'])
print('fa_s',result_fineArr_small[1]['fittedParameters'])