使用Python中的现有列将值映射到另外两个DataFrame列

时间:2018-03-24 06:37:54

标签: python python-3.x pandas csv

我正在制作一个可以占用任何csv文件的通用工具。该文件包含一个城市列,需要对纬度和经度进行地理编码。我有一个csv文件,看起来像这样。第一行是列名,第二行是变量类型。

Time,M1,M2,M3,CityName
temp,num,num,num,loc
20-May-13,19,20,0,delhi
20-May-13,25,42,7,agra
20-May-13,23,35,4,mumbai
20-May-13,21,32,3,delhi
20-May-13,17,27,1,mumbai
20-May-13,16,40,5,delhi

首先,我在City列中找到唯一值并形成一个列表。

filename = 'data_file.csv'
data_date = pd.read_csv(filename)
column_name = data_date.ix[:, data_date.loc[0] == "city"]
column_work = column_name.iloc[1:]
column_unique = column_work.iloc[:,3].unique().tolist()

其次,我编写了用于对我的城市进行地理编码的代码。

def geocode(address):
    i = 0
    try:
        while i < len(geocoders):
            # try to geocode using a service
            location = geocoders[i].geocode(address)

            # if it returns a location
            if location != None:

                # return those values
                return [location.latitude, location.longitude]
            else:
                # otherwise try the next one
                i += 1
    except:
        print (sys.exc_info()[0])
        return ['null','null']

    # if all services have failed to geocode, return null values
    return ['null','null']

list = ['delhi', 'agra', 'mumbai']
j = 0
lat = []
for row in list:
    print ('processing #',j)
    j+=1
    try:
        state = row
        address = state
        result = geocode(address)
        # add the lat/lon values to the row
        lat.extend(result)
    except:
       # print 'Unsuccessful'
       to_print = 'Unsuccessful'
       # row.extend(to_print)
       dout.append(row)
print(lat)

这给了我一个纬度和经度列表[28.7040592, 77.10249019999999, 27.1766701, 78.00807449999999, 19.0759837, 72.8776559]。我想把它写在我的CSV文件中

Time,M1,M2,M3,CityName,Latitude,Longitude
temp,num,num,num,loc,lat,lng
20-May-13,19,20,0,delhi,28.7040592,77.10249019999999
20-May-13,25,42,7,agra,27.1766701,78.00807449999999
20-May-13,23,35,4,mumbai,19.0759837, 72.8776559
20-May-13,21,32,3,delhi,28.7040592,77.10249019999999
20-May-13,17,27,1,mumbai,19.0759837, 72.8776559
20-May-13,16,40,5,delhi,28.7040592,77.10249019999999

我尝试制作一个单独的纬度和经度列表latitude = lat[0::2] longitude = lat[1::2]或将其转换为字典{'delhi': [28.7040592, 77.10249019999999], 'agra': [27.1766701, 78.00807449999999], 'mumbai': [19.0759837, 72.8776559]},但不知何故无法弄清楚如何在csv文件上编写它。

1 个答案:

答案 0 :(得分:0)

我认为将它们转换成字典是一种很好的方法。

dic = {'delhi': [28.7040592, 77.10249019999999],
       'agra': [27.1766701, 78.00807449999999],
       'mumbai': [19.0759837, 72.8776559]}

# Create new columns
data_date["Latitude"] = data_date.apply(lambda row: dic.get(row["CityName"])[0], axis = 1)
data_date["Longitude"] = data_date.apply(lambda row: dic.get(row["CityName"])[1], axis = 1)

# Write the data back to csv file
data_date.to_csv(filename, index = False)

通过这种方式,它从字典中获取相应城市名称的值,并将它们写入指定的列。最后,它用新的数据框覆盖旧的csv文件。