我有一个csv,其中包含伦敦地铁站的名称和lat / lng位置信息。它看起来像这样:
Station Lat Lng
Abbey Road 51.53195199 0.003737786
Abbey Wood 51.49078408 0.120286371
Acton 51.51688696 -0.267675543
Acton Central 51.50875781 -0.263415792
Acton Town 51.50307148 -0.280288296
我希望转换此csv以创建这些站的所有可能组合的原始目标矩阵。有270个站,因此有72,900个可能的组合。
最终我希望将此矩阵转换为具有以下格式的csv
O_Station O_lat O_lng D_Station D_lat D_lng
Abbey Road 51.53195199 0.003737786 Abbey Wood 51.49078408 0.120286371
Abbey Road 51.53195199 0.003737786 Acton 51.51688696 -0.267675543
Abbey Road 51.53195199 0.003737786 Acton Central 51.50875781 -0.263415792
Abbey Wood 51.49078408 0.120286371 Abbey Road 51.53195199 0.003737786
Abbey Wood 51.49078408 0.120286371 Acton 51.51688696 -0.267675543
Abbey Wood 51.49078408 0.120286371 Acton Central 51.50875781 -0.263415792
Acton 51.51688696 -0.267675543 Abbey Road 51.53195199 0.003737786
Acton 51.51688696 -0.267675543 Abbey Wood 51.49078408 0.120286371
Acton 51.51688696 -0.267675543 Acton Central 51.50875781 -0.263415792
第一步是使用循环将任何站与所有其他可能的站配对。然后我需要删除原点和目的地是同一站的0组合。
我尝试过使用NumPy函数column_stack。然而,这给出了一个奇怪的结果。
import csv
import numpy
from pprint import pprint
numpy.set_printoptions(threshold='nan')
with open('./London stations.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
Stations = ['{O_Station}'.format(**row) for row in reader]
print(Stations)
O_D = numpy.column_stack(([Stations],[Stations]))
pprint(O_D)
输出
Stations =
['Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town']
O_D =
array([['Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town',
'Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town']],
dtype='|S13')
我理想地寻找更合适的功能,并且难以在Numpy手册中找到它。
答案 0 :(得分:0)
这是一个不完整的答案,但我会跳过numpy并向右前进pandas
:
csv_file = '''Station Lat Lng
Abbey Road 51.53195199 0.003737786
Abbey Wood 51.49078408 0.120286371
Acton 51.51688696 -0.267675543
Acton Central 51.50875781 -0.263415792
Acton Town 51.50307148 -0.280288296'''
这很难,因为它不是真正用逗号分隔的,否则我们只能拨打pandas.read_csv()
:
names = [' '.join(x.split()[:-2]) for x in stations]
lats = [x.split()[-2] for x in stations]
lons = [x.split()[-1] for x in stations]
stations_dict = {names[i]: (lats[i], lons[i]) for i, _ in enumerate(stations)}
df = pd.DataFrame(stations_dict).T # Transpose it
df.columns = ['Lat', 'Lng']
df.index.name = 'Station'
所以我们最终得到df.head()
屈服:
Lat Lng
Station
Abbey Road 51.53195199 0.003737786
Abbey Wood 51.49078408 0.120286371
Acton 51.51688696 -0.267675543
Acton Central 51.50875781 -0.263415792
Acton Town 51.50307148 -0.280288296
获得排列可能意味着我们不需要将电台作为索引...目前还不确定。希望这有点帮助!
答案 1 :(得分:0)
当使用这样的表格数据时,我更喜欢使用pandas。它使您的数据结构控制变得简单。
import pandas as pd
#read in csv
stations = pd.read_csv('london stations.csv', index_col = 0)
#create new dataframe
O_D = pd.DataFrame(columns = ['O_Station','O_lat','O_lng','D_Station','D_lat','D_lng'])
#iterate through the stations
new_index= 0
for o_station in stations.index:
for d_station in stations.index:
ls = [o_station,stations.Lat.loc[o_station],stations.Lng.loc[o_station],d_station, stations.Lat.loc[d_station], stations.Lng.loc[d_station]]
O_D.loc[new_index] = ls
new_index+=1
#remove double stations
O_D = O_D[O_D.O_Station != O_D.D_Station]
这应该可以帮助您进行数据转换。