子集数据帧以与fmin一起使用会产生意外错误

时间:2018-07-09 19:55:59

标签: python pandas

我目前正在使用fmin()尝试将方程式拟合到我的数据。文件中的数据只是2列浮点数列表。这是我的代码:

filename = 'HAB_30_Master_Overall_no2100'
data = pd.read_csv(filename+'.csv', header=0, usecols=['Wavelength', '2.5'])

def fitFunc(x): 
    global B, A, data, sumresids
    wave = data['Wavelength']
    modelforfit = x[0]*wave**-x[1]
    data['model'] = modelforfit
    data['Residuals'] = abs(data['2.5'] - data['model'])
    sumresids = data['Residuals'].sum()
    return sumresids

def fitData():
    global xopt
    B = 2
    A = 1
    x0 = np.array([B, A])
    xopt, fopt, iter, funcalls, warnflag = fmin(fitFunc,x0,maxiter = 10000, full_output=True, disp=False) 
print xopt[0], xopt[1]

fitFunc(data['Wavelength'])
fitData()

当我使用文件中的所有值时,此代码有效。不过,我想做的是将数据框作为子集,这样我就可以看到仅包含一些数据点时,拟合度如何变化。如果我唯一更改的是将nrows = 10添加到read_csv调用中,即使文件中有> 90行,我也会收到错误消息:

  

ValueError:不允许对负整数幂进行整数运算。

如果我尝试执行一些操作,例如使用.iloc制作一个新的数据框来像这样子集化行:

filename = 'HAB_30_Master_Overall_no2100'
data = pd.read_csv(filename+'.csv', header=0, usecols=['Wavelength', '2.5'])
newdata = data.iloc[:10]

def fitFunc(x): 
    global B, A, data, sumresids
    wave = newdata['Wavelength']
    modelforfit = x[0]*wave**-x[1]
    newdata['model'] = modelforfit
    newdata['Residuals'] = abs(newdata['2.5'] - newdata['model'])
    sumresids = newdata['Residuals'].sum()
    return sumresids

def fitData():
    global xopt
    B = 2
    A = 1
    x0 = np.array([B, A])
    xopt, fopt, iter, funcalls, warnflag = fmin(fitFunc,x0,maxiter = 10000, full_output=True, disp=False) 
print xopt[0], xopt[1]

fitFunc(newdata['Wavelength'])
fitData()

我收到这样的警告:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
newdata['model'] = modelforfit
/var/folders/j8/1fzjf9cj3slcmyy1t89sth5w0000gp/T/tmpZRuvLX.py:20: 
SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  newdata['Residuals'] = abs(newdata['2.5'] - newdata['model']) 

它崩溃了。理想情况下,我也希望能够使用不连续的行,例如前5个和后5个,但是我什至一次只解决一个块。如果有人可以告诉我以上两种方法为何都不起作用并提供解决方案,那将非常有帮助。

编辑:This is a snippet of what my data looks like. 为了弄清楚这一点,我刚刚在read_csv调用中导入了一列,但最终目标是让它遍历这个较大文件的各段,或者是行的子集(问题)和/或逐列(已经知道了)

0 个答案:

没有答案