我正在制作一个可以占用任何csv文件的通用工具。该文件包含一个城市列,需要对纬度和经度进行地理编码。我有一个csv文件,看起来像这样。第一行是列名,第二行是变量类型。
Time,M1,M2,M3,CityName
temp,num,num,num,loc
20-May-13,19,20,0,delhi
20-May-13,25,42,7,agra
20-May-13,23,35,4,mumbai
20-May-13,21,32,3,delhi
20-May-13,17,27,1,mumbai
20-May-13,16,40,5,delhi
首先,我在City列中找到唯一值并形成一个列表。
filename = 'data_file.csv'
data_date = pd.read_csv(filename)
column_name = data_date.ix[:, data_date.loc[0] == "city"]
column_work = column_name.iloc[1:]
column_unique = column_work.iloc[:,3].unique().tolist()
其次,我编写了用于对我的城市进行地理编码的代码。
def geocode(address):
i = 0
try:
while i < len(geocoders):
# try to geocode using a service
location = geocoders[i].geocode(address)
# if it returns a location
if location != None:
# return those values
return [location.latitude, location.longitude]
else:
# otherwise try the next one
i += 1
except:
print (sys.exc_info()[0])
return ['null','null']
# if all services have failed to geocode, return null values
return ['null','null']
list = ['delhi', 'agra', 'mumbai']
j = 0
lat = []
for row in list:
print ('processing #',j)
j+=1
try:
state = row
address = state
result = geocode(address)
# add the lat/lon values to the row
lat.extend(result)
except:
# print 'Unsuccessful'
to_print = 'Unsuccessful'
# row.extend(to_print)
dout.append(row)
print(lat)
这给了我一个纬度和经度列表[28.7040592, 77.10249019999999, 27.1766701, 78.00807449999999, 19.0759837, 72.8776559]
。我想把它写在我的CSV文件中
Time,M1,M2,M3,CityName,Latitude,Longitude
temp,num,num,num,loc,lat,lng
20-May-13,19,20,0,delhi,28.7040592,77.10249019999999
20-May-13,25,42,7,agra,27.1766701,78.00807449999999
20-May-13,23,35,4,mumbai,19.0759837, 72.8776559
20-May-13,21,32,3,delhi,28.7040592,77.10249019999999
20-May-13,17,27,1,mumbai,19.0759837, 72.8776559
20-May-13,16,40,5,delhi,28.7040592,77.10249019999999
我尝试制作一个单独的纬度和经度列表latitude = lat[0::2] longitude = lat[1::2]
或将其转换为字典{'delhi': [28.7040592, 77.10249019999999], 'agra': [27.1766701, 78.00807449999999], 'mumbai': [19.0759837, 72.8776559]}
,但不知何故无法弄清楚如何在csv文件上编写它。
答案 0 :(得分:0)
我认为将它们转换成字典是一种很好的方法。
dic = {'delhi': [28.7040592, 77.10249019999999],
'agra': [27.1766701, 78.00807449999999],
'mumbai': [19.0759837, 72.8776559]}
# Create new columns
data_date["Latitude"] = data_date.apply(lambda row: dic.get(row["CityName"])[0], axis = 1)
data_date["Longitude"] = data_date.apply(lambda row: dic.get(row["CityName"])[1], axis = 1)
# Write the data back to csv file
data_date.to_csv(filename, index = False)
通过这种方式,它从字典中获取相应城市名称的值,并将它们写入指定的列。最后,它用新的数据框覆盖旧的csv文件。