Python:TypeError:zip参数#1必须支持迭代

时间:2017-08-16 18:13:46

标签: python pandas dataframe zip-operator

使用 zip(* map(...))调用时出错。很长的解释见下文。

  

TypeError:zip参数#1必须支持迭代

这是我得到的。包含城市及其在经度和纬度中的位置的数据框。现在我想使用harversine formular来计算城市之间的距离。

起点是这个Pandas DataFrame:

df['tmp'] = 1
df2 = pd.merge(df,df,on='tmp')
df2 = df2[df2.city_x != df2.city_y]

然后我将自己加入数据框,以获得成对的城市:

    city_x  lat_x       lng_x       tmp city_y  lat_y       lng_y
1   Berlin  52.52437    13.41053    1   Potsdam 52.39886    13.06566
2   Berlin  52.52437    13.41053    1   Hamburg 53.57532    10.01534
3   Potsdam 52.39886    13.06566    1   Berlin  52.52437    13.41053
5   Potsdam 52.39886    13.06566    1   Hamburg 53.57532    10.01534
6   Hamburg 53.57532    10.01534    1   Berlin  52.52437    13.41053
7   Hamburg 53.57532    10.01534    1   Potsdam 52.39886    13.06566

这给了我这个:

def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
    """
    Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes 
    based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
    """
    from math import radians, cos, sin, asin, sqrt
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles

    lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])

    # haversine formula 
    dlng = lng2 - lng1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
    c = 2 * asin(sqrt(a)) 
    distance = c * R
    return distance

现在让我们做重要的事情。哈维森公式被置于一个函数中:

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = zip(*map(haversine_distance, lng1, lat1, lng2, lat2))
    return dist

# now invoke the method in order to get a new column (series) back
get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])

然后应在连接的数据帧上调用此函数:

def lat_lng_to_cartesian(lat: float, lng: float) -> float:
    from math import radians, cos, sin
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles

    lat_, lng_ = map(radians, [lat, lng])

    x = R * cos(lat_) * cos(lng_)
    y = R * cos(lat_) * sin(lng_)
    z = R * sin(lat_)
    return x, y, z

def get_cartesian_coordinates(lat: pd.Series, lng: pd.Series) -> (pd.Series, pd.Series, pd.Series):
    if lat is None or lng is None:
        return
    x, y, z = zip(*map(lat_lng_to_cartesian, lat, lng))
    return x, y, z

get_cartesian_coordinates(df2['lat_x'], df2['lng_x'])

问题/错误:这会给我以下错误:

  

TypeError:zip参数#1必须支持迭代

备注:我没有得到的,这就是我收到错误的原因,因为其他方法(见下文)的效果非常好。基本上是一样的!

{{1}}

3 个答案:

答案 0 :(得分:3)

您的haversine_distance函数返回一个数字,但zip想要一个可迭代的数据,因此它会因异常而失败。

lat_lng_to_cartesian有效,因为它返回了一个可以迭代的3元组。

你可以通过返回1元组来消除异常:

return (distance,)

但是我没有看到在这里做到这一点 - 你实际上根本不需要拉链:

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = map(haversine_distance, lng1, lat1, lng2, lat2)
    return pd.Series(dist)

答案 1 :(得分:1)

正如我在评论中提到的,为了能够以您当前定义的方式使用haversine_distance,您需要先zip这些列mapping get_haversine_distance 1}}。实质上,在将每个元组解压缩为zipping函数的参数之前,您需要编辑haversine_distance函数以确保它是import pandas as pd import numpy as np df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300}, {'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600}, {'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]); df # city lat lng tmp # 0 Berlin 52.52437 13.41053 1 # 1 Potsdam 52.39886 13.06566 1 # 2 Hamburg 53.57532 10.01534 1 # Make sure to reset the index after you filter out the unneeded rows df['tmp'] = 1 df2 = pd.merge(df,df,on='tmp') df2 = df2[df2.city_x != df2.city_y].reset_index(drop=True) # city_x lat_x lng_x tmp city_y lat_y lng_y # 0 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.06566 # 1 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.01534 # 2 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.41053 # 3 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.01534 # 4 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.41053 # 5 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566 def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series: dist = pd.Series(map(lambda x: haversine_distance(*x), zip(lng1, lat1, lng2, lat2))) return dist def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float: """ Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula """ from math import radians, cos, sin, asin, sqrt R = 6371 # Radius of earth in kilometers. Use 3956 for miles lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2]) # haversine formula dlng = lng2 - lng1 dlat = lat2 - lat1 a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2 c = 2 * asin(sqrt(a)) distance = c * R return distance df2['distance'] = get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y']) # city_x lat_x lng_x tmp city_y lat_y lng_y distance # 0 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.06566 27.215704 # 1 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.01534 255.223782 # 2 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.41053 27.215704 # 3 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.01534 242.464120 # 4 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.41053 255.223782 # 5 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566 242.464120 对应的元组行。以下是使用提供的数据的说明:

disadvantage

如果您希望输出看起来像这样,请告诉我。

答案 2 :(得分:1)

Andrea指出问题是hasrsine_distance返回一个数字而不是迭代器。话虽这么说,你也可以使用apply来df2:

df2.apply(lambda row: haversine_distance(row['lng_x'], row['lat_x'], row['lng_y'], row['lat_y']), axis=1)