Question

我试图在给定数据帧的所有行上应用基本样条函数（dfTest，其中包含向量x的值）以获得更大的一个（dfBigger），其中包含vector xnew（包含x）的所有值

因此我定义了以下变量：

import pandas as pd
import numpy as np

x = [0,1,3,5]
xnew = range(0,6)

np.random.seed(123)
dfTest = pd.DataFrame(np.random.rand(12).reshape(3,4))

和基本样条函数：

def spline(y, x , xnew):
    from scipy import interpolate
    model = interpolate.splrep(x,y, s=0.)
    ynew = interpolate.splev(xnew,model)
    result = ynew.round(3)
    return result

似乎有效：

spline(dfTest.iloc[0],x,xnew)
Out[176]: array([ 0.696,  0.286,  0.161,  0.227,  0.388,  0.551])

但是当我尝试使用以下方法在所有行上应用它时：

dfBigger = dfTest.apply(lambda row : spline(row, x, xnew), axis = 1)

我明白了：

ValueError: Shape of passed values is (3, 6), indices imply (3, 4)

由于dfBigger大小未在任何地方定义，我看不出有什么问题。任何有关此代码的帮助和/或评论都将不胜感激。

Answer 1

df.apply(func)尝试从值中构建新的Series或DataFrame 由func返回。 Series或DataFrame的形状取决于其类型 func返回的值。为了更好地处理df.apply的行为，尝试以下调用：

dfTest.apply(lambda row: 1, axis=1)                       # Series
dfTest.apply(lambda row: [1], axis=1)                     # Series
dfTest.apply(lambda row: [1,2], axis=1)                   # Series
dfTest.apply(lambda row: [1,2,3], axis=1)                 # Series
dfTest.apply(lambda row: [1,2,3,4], axis=1)               # Series
dfTest.apply(lambda row: [1,2,3,4,5], axis=1)             # Series

dfTest.apply(lambda row: np.array([1]), axis=1)           # DataFrame
dfTest.apply(lambda row: np.array([1,2]), axis=1)         # ValueError
dfTest.apply(lambda row: np.array([1,2,3]), axis=1)       # ValueError
dfTest.apply(lambda row: np.array([1,2,3,4]), axis=1)     # DataFrame!
dfTest.apply(lambda row: np.array([1,2,3,4,5]), axis=1)   # ValueError

dfTest.apply(lambda row: pd.Series([1]), axis=1)          # DataFrame
dfTest.apply(lambda row: pd.Series([1,2]), axis=1)        # DataFrame
dfTest.apply(lambda row: pd.Series([1,2,3]), axis=1)      # DataFrame
dfTest.apply(lambda row: pd.Series([1,2,3,4]), axis=1)    # DataFrame
dfTest.apply(lambda row: pd.Series([1,2,3,4,5]), axis=1)  # DataFrame

那么我们可以从这些实验中得出什么规则？

如果func返回标量或列表，df.apply(func)会返回一个系列。
如果func返回系列，则df.apply(func)会返回一个DataFrame。
如果func返回1D NumPy数组，和，则数组只有一个元素，df.apply(func)会返回一个DataFrame。（不是一个非常有用的案例......）
如果func返回1D NumPy数组，和数组的元素数与df列数相同，则df.apply(func)会返回DataFrame。（有用，但有限）

由于func返回6个值，并且您希望将DataFrame作为结果，解决方案是让func返回Series而不是NumPy数组：

def spline(y, x, xnew):
    ...
    return pd.Series(result)

import numpy as np
import pandas as pd
from scipy import interpolate

def spline(y, x, xnew):
    model = interpolate.splrep(x,y, s=0.)
    ynew = interpolate.splev(xnew,model)
    result = ynew.round(3)
    return pd.Series(result)

x = [0,1,3,5]
xnew = range(0,6)
np.random.seed(123)
dfTest = pd.DataFrame(np.random.rand(12).reshape(3,4))
# spline(dfTest.iloc[0],x,xnew)
dfBigger = dfTest.apply(lambda row : spline(row, x, xnew), axis=1)
print(dfBigger)

产量

        0      1      2      3      4      5
 0  0.696  0.286  0.161  0.227  0.388  0.551
 1  0.719  0.423  0.630  0.981  1.119  0.685
 2  0.481  0.392  0.333  0.343  0.462  0.729

在pandas数据帧上使用apply时传递的值的形状错误

1 个答案: