Question

我有一个以下函数来计算两个参数x，y的值：

import numpy as np
import math

def some_func(pt1,pt2):
    return math.sqrt( (pt2[0]-pt1[0])*(pt2[0]-pt1[0]) + (pt2[1]-pt1[1])*(pt2[1]-pt1[1]) )

用法：

a = 1, 2
b = 4, 5
some_func(a,b)
#outputs = 4.24264
#or some_func((1,2), (4,5)) would give the same output too

我有以下df：

  seq     x    y    points
    1     2    3    (2,3)
    1    10    5    (10,5)
    1     6    7    (6,7)
    2     8    9    (8,9)
    2    10   11    (10,11)

列“点”是使用下面的代码获得的：

df["points"] = list(zip(df.loc[:, "x"], df.loc[:, "y"]))

我想在整个df上应用some_func函数，也可以通过“seq”对它们进行分组

我试过了：

df["value"] = some_func(df["points"].values, df["points"].shift(1).values)
#without using groupby

和

df["value"] = df.groupby("seq").points.apply(some_func) #with groupby

但是它们都显示TypeError，表示缺少1个参数或不支持的数据类型。

预期df

  seq    x    y    points     value
    1     2    3    (2,3)       NaN
    1    10    5    (10,5)     8.24 
    1     6    7    (6,7)      4.47
    2     8    9    (8,9)       NaN
    2     10   11   (10,11)    2.82

Answer 1

您可以先将groupby与DataFrameGroupBy.shift一起使用，但需要将NaN替换为元组 - 一种可能的解决方案是使用fillna。上次使用apply

s = pd.Series([(np.nan, np.nan)], index=df.index)
df['shifted'] = df.groupby('seq').points.shift().fillna(s)
df['values'] = df.apply(lambda x: some_func(x['points'], x['shifted']), axis=1)
print (df)
   seq   x   y    points     shifted    values
0    1   2   3    (2, 3)  (nan, nan)       NaN
1    1  10   5   (10, 5)      (2, 3)  8.246211
2    1   6   7    (6, 7)     (10, 5)  4.472136
3    2   8   9    (8, 9)  (nan, nan)       NaN
4    2  10  11  (10, 11)      (8, 9)  2.828427

另一种解决方案是在apply中过滤掉NaN：

df['shifted'] = df.groupby('seq').points.shift()
f = lambda x: some_func(x['points'], x['shifted']) if pd.notnull(x['shifted']) else np.nan
df['values'] = df.apply(f, axis=1)
print (df)
   seq   x   y    points  shifted    values
0    1   2   3    (2, 3)      NaN       NaN
1    1  10   5   (10, 5)   (2, 3)  8.246211
2    1   6   7    (6, 7)  (10, 5)  4.472136
3    2   8   9    (8, 9)      NaN       NaN
4    2  10  11  (10, 11)   (8, 9)  2.828427

Answer 2

f=lambda x,y:some_func(x,y)
f["value"] = f(df["points"].values, df["points"].shift(1).values)

应用函数数据框列

2 个答案: