问题:我想改善代码,以在城市的两个区域之间获得{strong>启动和终止的ID
汽车。
我有:
`.csv文件包含如下所示的城市区域:
borders =
zone longitude latitude multi
12 3.5248 22.0952 MULTIPOLYGON(((3.4991688909 22.1096707778,3.4992650150 22.1094740107, ... ,3.4992922409 22.1094203597,3.4995744041 22.1087939694,3.4997139945 22.1081206986)))
14 3.5139 22.111 MULTIPOLYGON(((12.4991688909 22.1096707778,3.4992650150 22.1094740107, ... ,32.4992922409 22.1094203597,3.4995744041 32.1087939694,3.4997139945 22.1081206986)))
...
800 3.5273 22.1019 MULTIPOLYGON(((4.4991688909 15.1096707778,3.4992650150 22.1094740107, ... ,4.4992922409 75.1094203597,3.4995744041 22.1087939694,3.4997139945 22.1081206986)))
因此,我要检查我的
.csv文件,其中包含出租车数据:
data =
ID latitude longitude epoch day_of_week
e35f6 11.9125 3.7432 8765456787 Sunday
e35f6 11.9125 3.7432 4567876545 Sunday
...
fhg3g 23.9125 5.7432 2345434554 Sunday
因此,我想检查我的车ID
是否在zone
12时开始旅行并在zone
14中结束(但我想检查每个区域)
我到目前为止所做的:
border
文件,选择2行,创建新的csv文件,手动输入多面的地理数据并将t转换为POINT(geopandas)data
然后
data
中每个ID的最前点和最后点ID
,以查看同一辆车是否在某个区域开始和结束。 但这是非常耗时的过程。寻找改进。 这是我的代码:
df_first = df.drop_duplicates(subset=['id_easy'], keep='first') # removed duplicates
df_last = df.drop_duplicates(subset=['id_easy'], keep='last') # removed duplicates
crs = {'init':'epsg:4326'}
geometry_first = [Point(xy) for xy in zip(df_first.longitude,df_first.latitude)]
df_first = gpd.GeoDataFrame(df_first,crs=crs,geometry=geometry_first)
geometry_last = [Point(xy) for xy in zip(df_last.longitude,df_last.latitude)]
df_last = gpd.GeoDataFrame(df_last,crs=crs,geometry=geometry_last)
border_1 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/zone1.csv")
geometry_1 = [Point(xy) for xy in zip(border_1.longitude,border_1.latitude)]
border_1 = gpd.GeoDataFrame(border_1,crs=crs,geometry=geometry_1)
border_2 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/zone2.csv")
geometry_2 = [Point(xy) for xy in zip(border_2.longitude,border_2.latitude)]
border_2 = gpd.GeoDataFrame(border_2,crs=crs,geometry=geometry_2)
turin_final_1 = Polygon([[p.x, p.y] for p in border_1.geometry])
first = df_first[df_first.geometry.within(turin_final_1)]
turin_final_2 = Polygon([[p.x, p.y] for p in border_2.geometry])
last = df_last[df_last.geometry.within(turin_final_2)]
first.epoch = pd.to_datetime(first.epoch,unit = 's')
first.index = pd.to_datetime(first.epoch)
last.index = pd.to_datetime(last.epoch)
first1 = first.between_time('0:00', '1:00')
last1 = last.between_time('0:00', '1:00') #till to 24
first1.to_csv(r'D:\anaconda path\PTV\1) Data preparation\between zones\df1\Saturday1_first1.csv',index=False)
last1.to_csv(r'D:\anaconda path\PTV\1) Data preparation\between zones\df2\Saturday1_last1.csv',index=False) #till to 24
os.chdir("D:/anaconda path/PTV/1) Data preparation/between zones/df1")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
df1 = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
df1.to_csv( "df1.csv", index=False, encoding='utf-8-sig')
os.chdir("D:/anaconda path/PTV/1) Data preparation/between zones/df2")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
df2 = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
df2.to_csv( "df2.csv", index=False, encoding='utf-8-sig')
df1 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/df1/df1.csv")
df2 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/df2/df2.csv")
df3 = (pd.concat((df1[df1.id_easy.isin(df2.id_easy)],
df2[df2.id_easy.isin(df1.id_easy)]),
ignore_index=True)
.sort_values('id_easy'))
答案 0 :(得分:0)
如果我理解正确,您想将区域编号分配给每辆车的起点和终点。由于在边框数据中具有关于城市区域(多列)形状的多边形信息,因此建议您使用此信息在起点/终点进行空间连接,以查看汽车开始/结束旅程的区域。这是我的详细建议:
我假设您已经读取了border
和data
中的数据,并且其格式正确(即{{1}的multi
列中的每个单元格}包含border
)。我还假设您的区域是不相交的,即没有重叠。
Shapley.Multipolygon
需要一个几何列,因为它只能将具有此名称的列识别为几何信息:
GeoDataFrame
现在,我们还为border['geometry'] = border['multi']
中的汽车数据中给出的点生成了几何信息:
df
完成后,现在提取起点和终点:
df['geometry'] = df[['longitude', 'latitude']].apply(lambda x: Point(x[0], x[1]), axis=1)
现在我们可以进行空间连接以获得所需的每个起点和终点的区域:
df_first = df.drop_duplicates(subset=['id_easy'], keep='first')
df_last = df.drop_duplicates(subset=['id_easy'], keep='last')
就是这样。现在,您在df_first = gpd.sjoin(df_first, shp.loc[:, ['geometry', 'zone']], how='left', op='within')
df_last = gpd.sjoin(df_last, shp.loc[:, ['geometry', 'zone']], how='left', op='within')
列中具有每个点的区域信息。