我有一个以下函数来计算两个参数x,y的值:
import numpy as np
import math
def some_func(pt1,pt2):
return math.sqrt( (pt2[0]-pt1[0])*(pt2[0]-pt1[0]) + (pt2[1]-pt1[1])*(pt2[1]-pt1[1]) )
用法:
a = 1, 2
b = 4, 5
some_func(a,b)
#outputs = 4.24264
#or some_func((1,2), (4,5)) would give the same output too
我有以下df:
seq x y points
1 2 3 (2,3)
1 10 5 (10,5)
1 6 7 (6,7)
2 8 9 (8,9)
2 10 11 (10,11)
列“点”是使用下面的代码获得的:
df["points"] = list(zip(df.loc[:, "x"], df.loc[:, "y"]))
我想在整个df上应用some_func函数,也可以通过“seq”对它们进行分组
我试过了:
df["value"] = some_func(df["points"].values, df["points"].shift(1).values)
#without using groupby
和
df["value"] = df.groupby("seq").points.apply(some_func) #with groupby
但是它们都显示TypeError,表示缺少1个参数或不支持的数据类型。
预期df
seq x y points value
1 2 3 (2,3) NaN
1 10 5 (10,5) 8.24
1 6 7 (6,7) 4.47
2 8 9 (8,9) NaN
2 10 11 (10,11) 2.82
答案 0 :(得分:3)
您可以先将groupby
与DataFrameGroupBy.shift
一起使用,但需要将NaN
替换为元组 - 一种可能的解决方案是使用fillna
。上次使用apply
s = pd.Series([(np.nan, np.nan)], index=df.index)
df['shifted'] = df.groupby('seq').points.shift().fillna(s)
df['values'] = df.apply(lambda x: some_func(x['points'], x['shifted']), axis=1)
print (df)
seq x y points shifted values
0 1 2 3 (2, 3) (nan, nan) NaN
1 1 10 5 (10, 5) (2, 3) 8.246211
2 1 6 7 (6, 7) (10, 5) 4.472136
3 2 8 9 (8, 9) (nan, nan) NaN
4 2 10 11 (10, 11) (8, 9) 2.828427
另一种解决方案是在apply
中过滤掉NaN
:
df['shifted'] = df.groupby('seq').points.shift()
f = lambda x: some_func(x['points'], x['shifted']) if pd.notnull(x['shifted']) else np.nan
df['values'] = df.apply(f, axis=1)
print (df)
seq x y points shifted values
0 1 2 3 (2, 3) NaN NaN
1 1 10 5 (10, 5) (2, 3) 8.246211
2 1 6 7 (6, 7) (10, 5) 4.472136
3 2 8 9 (8, 9) NaN NaN
4 2 10 11 (10, 11) (8, 9) 2.828427
答案 1 :(得分:0)
f=lambda x,y:some_func(x,y)
f["value"] = f(df["points"].values, df["points"].shift(1).values)