使用带有系列参数的pd.apply会产生TypeError

时间:2015-04-23 16:22:09

标签: python pandas

我正在尝试使用半正式公式计算两对纬度/经度之间的距离。我正在为最后两个函数参数使用一个系列,因为我试图计算这个我存储在两个pandas列中的多个坐标。我收到以下错误TypeError: ("'Series' object is not callable", u'occurred at index 0')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import radians, cos, sin, asin, sqrt

origin_lat = 51.507200
origin_lon = -0.127500

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

df['dist_from_org'] = df.apply(haversine(origin_lon, origin_lat, df['ulong'], df['ulat']), axis=1)

df的系列看起来像这样:

+----+---------+----------+
|    |  ulat   |  ulong   |
+----+---------+----------+
|  0 | 52.6333 | 1.30000  |
|  1 | 51.4667 | -0.35000 |
|  2 | 51.5084 | -0.12550 |
|  3 | 51.8833 | 0.56670  |
|  4 | 51.7667 | -1.38330 |
|  5 | 55.8667 | -2.10000 |
|  6 | 55.8667 | -2.10000 |
|  7 | 52.4667 | -1.91670 |
|  8 | 51.8833 | 0.90000  |
|  9 | 53.4083 | -2.14940 |
| 10 | 53.0167 | -1.73330 |
| 11 | 51.4667 | -0.35000 |
| 12 | 51.4667 | -0.35000 |
| 13 | 52.7167 | -1.36670 |
| 14 | 51.4667 | -0.35000 |
| 15 | 52.9667 | -1.16667 |
| 16 | 51.4667 | -0.35000 |
| 17 | 51.8833 | 0.56670  |
| 18 | 51.8833 | 0.56670  |
| 19 | 51.4833 | 0.08330  |
| 20 | 52.0833 | 0.58330  |
| 21 | 52.3000 | -0.70000 |
| 22 | 51.4000 | -0.05000 |
| 23 | 51.9333 | -2.10000 |
| 24 | 51.9000 | -0.43330 |
| 25 | 53.4809 | -2.23740 |
| 26 | 51.4853 | -3.18670 |
| 27 | 51.2000 | -1.48333 |
| 28 | 51.7779 | -3.21170 |
| 29 | 51.4667 | -0.35000 |
| 30 | 51.7167 | -0.28330 |
| 31 | 52.2000 | 0.11670  |
| 32 | 52.4167 | -1.55000 |
| 33 | 56.5000 | -2.96670 |
| 34 | 51.2167 | -1.05000 |
| 35 | 51.8964 | -2.07830 |
+----+---------+----------+

我不允许在pd.apply函数中使用系列吗?如果是这样,我如何逐行应用函数并将输出分配给新列?

1 个答案:

答案 0 :(得分:1)

调用该函数时,您不需要使用apply。只需使用:

df['dist_from_org'] = haversine(origin_lon, origin_lat, df['ulong'], df['ulat'])

当我运行你的代码时(使用origin_lon的标量值,origin_lat,我得到TypeError:无法将系列转换为。这是由赋值a = ...

引起的

我重新设计了适用于系列的公式:

a = dlat.divide(2).apply(sin).pow(2) 
    + lat1.apply(cos).multiply(lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2)))

请告诉我这是否适合您。

如果origin_lon和origin_lat是常量(而不是系列),则使用以下公式:

a = dlat.divide(2).apply(sin).pow(2) + cos(lat1) * lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2))

因为参数lon2和lat2是Pandas系列,所以dlon和dlat也都是Series对象。然后,您需要在系列中使用apply将该函数应用于列表中的每个元素。