如何使用python pandas在特定索引中添加新列

时间:2015-03-26 02:27:38

标签: python google-maps csv pandas

我想使用Google API在CSV文件中获取“位置”的纬度和经度,我可以使用Google API模块获得“lat”,“lng”。但我无法将文件保存回原始文件并插入“位置”

我的原始文件如下:

date            time     location  birdName count birdName count birdName  count                     
1990-02-10   0900:1200   balabala    bird1    15    bird2    10    bird3    20                   
1990-02-28   1300:1500   balabala    bird4    40    bird5    10    bird6    25       
1990-03-01   0900-1200   balabala    bird7    45    bird8    15    bird9    30                       
  ...          ...         ...        ...    ...     ...     ...    ...    ...

我想在'location'之后插入'lat'和'lng'列,如下所示:

date            time     location   lat   lng  birdName count birdName count birdName  count                     
1990-02-10   0900:1200   balabala   xxx   xxx   bird1    15    bird2    10    bird3    20                   
1990-02-28   1300:1500   balabala   xxx   xxx   bird4    40    bird5    10    bird6    25       
1990-03-01   0900-1200   balabala   xxx   xxx   bird7    45    bird8    15    bird9    30                       
  ...          ...         ...      ...   ...    ...     ...     ...     ...    ...    ...

Google API模块:https://drive.google.com/open?id=0B6SUWnrBmDwSb3BabFdEcXV3LUU&authuser=0

我的代码:

# -*- coding: utf-8 -*-
import pandas as pd
from geocodequery import GeocodeQuery

def addrs(location):
  for addrs in location:
    addr= addrs
    gq = GeocodeQuery("zh-tw", "tw")
    gq.get_geocode(addr)
    lng=gq.get_lng()
    lat=gq.get_lat()
    df['lat']=lat
    df['lng']=lng         
    df.to_csv('./birdsIwant.csv')   


 df = pd.read_csv('./birdsIwant.csv',low_memory=False)
 addrs(df['location'])

我该怎么办?

1 个答案:

答案 0 :(得分:0)

您可以使用花式索引更改列顺序:

In [179]:
# add the columns
df['lat'] = np.random.randn(len(df))
df['lng'] = np.random.randn(len(df))
df
Out[179]:
         date       time  location birdName  count birdName.1  count.1  \
0  1990-02-10  0900:1200  balabala    bird1     15      bird2       10   
1  1990-02-28  1300:1500  balabala    bird4     40      bird5       10   
2  1990-03-01  0900-1200  balabala    bird7     45      bird8       15   

  birdName.2  count.2       lat       lng  
0      bird3       20 -0.915371 -1.508814  
1      bird6       25 -0.716439  1.008078  
2      bird9       30  0.609510 -1.185927  
In [185]:
# get a list of the columns
col_list = list(df)
# insert column names at new positions
col_list.insert(3,'lat')
col_list.insert(4,'lng')
# slice off the last 2 values
col_list=col_list[:-2]
print(col_list)

['date', 'time', 'location', 'lat', 'lng', 'birdName', 'count', 'birdName.1', 'count.1', 'birdName.2', 'count.2']
In [187]:
# use ix and pass the new column order to sort the order
df = df.ix[:,col_list]
df
Out[187]:
         date       time  location       lat       lng birdName  count  \
0  1990-02-10  0900:1200  balabala -0.915371 -1.508814    bird1     15   
1  1990-02-28  1300:1500  balabala -0.716439  1.008078    bird4     40   
2  1990-03-01  0900-1200  balabala  0.609510 -1.185927    bird7     45   

  birdName.1  count.1 birdName.2  count.2  
0      bird2       10      bird3       20  
1      bird5       10      bird6       25  
2      bird8       15      bird9       30  

修改

你的代码在每次迭代时写入csv,所以即使它确实设置了你在每次迭代时覆盖的正确lat和lng值,你也应该在函数外部写入csv。无论如何,以下更清楚,应该有效:

def addrs(location):
    gq = GeocodeQuery("zh-tw", "tw")
    gq.get_geocode(location)
    return pd.Series([gq.get_lat(), gq.get_lng()])

df[['lat','lng']] = df['location'].apply(addrs)
df.to_csv('./birdsIwant.csv')