使用Pandas函数使用其他两个计算列

时间:2018-09-02 08:11:57

标签: python pandas

首先,很抱歉,如果重复此问题,但我无法使用类似问题中的解释来解决我的问题...

我有一个函数,其中考虑了两个参数(经度和纬度),然后输入Google API来提取这些坐标的城市和国家/地区。该函数如下:

from urllib.request import urlopen
import json
def getplace(lat, lon):
    url = "http://maps.googleapis.com/maps/api/geocode/json?"
    url += "latlng=%s,%s&sensor=false" % (lat, lon)
    v = urlopen(url).read()
    j = json.loads(v)
    components = j['results'][0]['address_components']
    country = town = None
    for c in components:
        if "country" in c['types']:
            country = c['long_name']
        if "administrative_area_level_2" in c['types']:
            town = c['long_name']
    return town, country

我也有一个包含项目的数据库,其中大多数(但不是全部)包含一个包含经度的字段和一个包含纬度的不同字段。在某些行中还缺少一些数据。

reference   name    lon        lat
0           name1   34.0055    1.0041
1           name1   NaN        NaN
2           name1   39.5632    3.6854
....

如何创建一个附加到DataFrame的新字段,以包含计算出的值?

我尝试了以下语句,但没有成功:

df['city'] = getplace(df['lon'], df['lat'])

并且:

df['city'] = df.apply(lambda x : coords(x['lon'], x['lat']) , axis=1)

什么是最好的方法?

非常感谢您。

编辑: 因此,我将完整代码更改为此:

from urllib.request import urlopen
import json
def getplace(lat, lon):
    if np.isnan(lat)==False:
        url = "http://maps.googleapis.com/maps/api/geocode/json?"
        url += "latlng=%s,%s&sensor=false" % (lat, lon)
        v = urlopen(url).read()
        j = json.loads(v)
        components = j['results'][0]['address_components']
        country = town = None
        for c in components:
            if "country" in c['types']:
                country = c['long_name']
            if "administrative_area_level_2" in c['types']:
                town = c['long_name']
        return town, country

import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline

df = pd.read_csv('items.csv')
df['city']=df.apply(lambda x : getplace(x['lat'], x['lon']) , axis=1)

它仍然无法正常工作,并显示以下错误:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-15-bffdb49e289b> in <module>()
----> 1 df['city']=df.apply(lambda x : getplace(x['lat'], x['lon']) , axis=1)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6002                          args=args,
   6003                          kwds=kwds)
-> 6004         return op.get_result()
   6005 
   6006     def applymap(self, func):

~/anaconda3/lib/python3.6/site-packages/pandas/core/apply.py in get_result(self)
    140             return self.apply_raw()
    141 
--> 142         return self.apply_standard()
    143 
    144     def apply_empty_result(self):

~/anaconda3/lib/python3.6/site-packages/pandas/core/apply.py in apply_standard(self)
    246 
    247         # compute the result using the series generator
--> 248         self.apply_series_generator()
    249 
    250         # wrap results

~/anaconda3/lib/python3.6/site-packages/pandas/core/apply.py in apply_series_generator(self)
    275             try:
    276                 for i, v in enumerate(series_gen):
--> 277                     results[i] = self.f(v)
    278                     keys.append(v.name)
    279             except Exception as e:

<ipython-input-15-bffdb49e289b> in <lambda>(x)
----> 1 df['city']=df.apply(lambda x : getplace(x['lat'], x['lon']) , axis=1)

<ipython-input-10-ff447dcff3e8> in getplace(lat, lon)
      7         v = urlopen(url).read()
      8         j = json.loads(v)
----> 9         components = j['results'][0]['address_components']
     10         country = town = None
     11         for c in components:

IndexError: ('list index out of range', 'occurred at index 3')

以下是我尝试使用的文件的简化版本: https://drive.google.com/open?id=1Y3vtwage5kqxKWZIdQEwpy5qIP2KAGNT 非常感谢

1 个答案:

答案 0 :(得分:0)

如果您将 self.body.position = x,y 替换为coords,并确保照顾包含getplace的行,那么您的第二个示例将按预期工作(并且是惯用的熊猫)。

NaN