尝试使用scipy内插2D数据-仅获取NaN数组

时间:2018-10-29 19:38:40

标签: python pandas scipy interpolation synthetic

我正在尝试编写一段非常简单的代码,该代码将从一组现有数据中进行插值,以创建值的综合分布。

到目前为止,我的代码如下:

import pandas as pd
import numpy as np
import scipy
from scipy.interpolate import griddata
import matplotlib

CRN_data=pd.read_table('disequilibrium data.dat',sep=',')
kzz=CRN_data['Kzz']
temperature=CRN_data['Temperature']
degree=CRN_data['Mean Degree']
points=np.ndarray(shape=(len(kzz),2),dtype='float')
for i in range(len(kzz)):
    points[i][0]=kzz[i]
    points[i][1]=temperature[i]
gridx,gridy= np.mgrid[0:1:100j,0:1:200j]
grid=griddata(points,degree,(gridx,gridy),method='cubic')
print grid

我要从中插入的数据集如下:

Kzz,Temperature,Mean Degree,   
1.00E+06,400,7.41E+18
1.00E+06,500,4.48E+23
...
1.00E+08,400,4.67E+18
1.00E+08,500,6.88E+23
1.00E+08,750,1.88E+34
...
1.00E+10,750,2.73E+33
1.00E+10,900,2.82E+37
1.00E+10,1000,1.19E+39
...

但是,在代码运行时,我得到的主要输出是

[[ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 ..., 
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]]

这显然不是很有帮助。这是Scipy中的错误,还是(更有可能)我做错了什么?

2 个答案:

答案 0 :(得分:2)

您正在接收nan值,因为gridxgridy中包含的请求点位于points中输入点的凸包之外。您可以指定一个fill_value用于推断点,但您可能会考虑重新指定为gridxgridy分配的限制,以产生有意义的结果,例如:

import pandas as pd
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt

CRN_data = pd.DataFrame([
[1.00E+06,400,7.41E+18],
[1.00E+06,500,4.48E+23],
[1.00E+08,400,4.67E+18],
[1.00E+08,500,6.88E+23],
[1.00E+08,750,1.88E+34],
[1.00E+10,750,2.73E+33],
[1.00E+10,900,2.82E+37],
[1.00E+10,1000,1.19E+39]],
columns=['Kzz','Temperature','Mean Degree'])

kzz = CRN_data['Kzz']
temperature = CRN_data['Temperature']
degree = CRN_data['Mean Degree']

points = np.matrix([[kzz[i], temperature[i]] for i in range(len(kzz))])

gridx, gridy = np.mgrid[kzz.min():kzz.max():100j,temperature.min():temperature.max():200j]

grid = griddata(points, degree, (gridx, gridy), method='cubic')

收益:

[[7.41000000e+18 1.35147259e+22 2.70220418e+22 ...            nan
             nan            nan]
 [           nan 1.07878728e+33 1.26216288e+33 ...            nan
             nan            nan]
 [           nan            nan 1.38255505e+35 ...            nan
             nan            nan]
 ...
 [           nan            nan            nan ... 1.16569048e+39
             nan            nan]
 [           nan            nan            nan ... 1.16394396e+39
  1.17798560e+39            nan]
 [           nan            nan            nan ... 1.16129655e+39
  1.17564827e+39 1.19000000e+39]]

并绘制:

enter image description here

答案 1 :(得分:0)

将源数据的范围与目标网格进行比较。根据我的观察,当网格为0-1(x和y)时,源范围为x = 1e6:1e10和y = 400:1000。在这种情况下,“目标”不在源数据之内,使用“线性”或“三次”将为您提供NaN-尝试使用“最近”,而Nan将消失。