应用函数数据框列

时间:2017-06-28 06:19:16

标签: python pandas numpy dataframe

我有一个以下函数来计算两个参数x,y的值:

import numpy as np
import math

def some_func(pt1,pt2):
    return math.sqrt( (pt2[0]-pt1[0])*(pt2[0]-pt1[0]) + (pt2[1]-pt1[1])*(pt2[1]-pt1[1]) )

用法:

a = 1, 2
b = 4, 5
some_func(a,b)
#outputs = 4.24264
#or some_func((1,2), (4,5)) would give the same output too

我有以下df:

  seq     x    y    points
    1     2    3    (2,3)
    1    10    5    (10,5)
    1     6    7    (6,7)
    2     8    9    (8,9)
    2    10   11    (10,11)

列“点”是使用下面的代码获得的:

df["points"] = list(zip(df.loc[:, "x"], df.loc[:, "y"])) 

我想在整个df上应用some_func函数,也可以通过“seq”对它们进行分组

我试过了:

df["value"] = some_func(df["points"].values, df["points"].shift(1).values)
#without using groupby

df["value"] = df.groupby("seq").points.apply(some_func) #with groupby

但是它们都显示TypeError,表示缺少1个参数或不支持的数据类型。

预期df

  seq    x    y    points     value
    1     2    3    (2,3)       NaN
    1    10    5    (10,5)     8.24 
    1     6    7    (6,7)      4.47
    2     8    9    (8,9)       NaN
    2     10   11   (10,11)    2.82

2 个答案:

答案 0 :(得分:3)

您可以先将groupbyDataFrameGroupBy.shift一起使用,但需要将NaN替换为元组 - 一种可能的解决方案是使用fillna。上次使用apply

s = pd.Series([(np.nan, np.nan)], index=df.index)
df['shifted'] = df.groupby('seq').points.shift().fillna(s)
df['values'] = df.apply(lambda x: some_func(x['points'], x['shifted']), axis=1)
print (df)
   seq   x   y    points     shifted    values
0    1   2   3    (2, 3)  (nan, nan)       NaN
1    1  10   5   (10, 5)      (2, 3)  8.246211
2    1   6   7    (6, 7)     (10, 5)  4.472136
3    2   8   9    (8, 9)  (nan, nan)       NaN
4    2  10  11  (10, 11)      (8, 9)  2.828427

另一种解决方案是在apply中过滤掉NaN

df['shifted'] = df.groupby('seq').points.shift()
f = lambda x: some_func(x['points'], x['shifted']) if pd.notnull(x['shifted']) else np.nan
df['values'] = df.apply(f, axis=1)
print (df)
   seq   x   y    points  shifted    values
0    1   2   3    (2, 3)      NaN       NaN
1    1  10   5   (10, 5)   (2, 3)  8.246211
2    1   6   7    (6, 7)  (10, 5)  4.472136
3    2   8   9    (8, 9)      NaN       NaN
4    2  10  11  (10, 11)   (8, 9)  2.828427

答案 1 :(得分:0)

f=lambda x,y:some_func(x,y)
f["value"] = f(df["points"].values, df["points"].shift(1).values)