两个多边形中的检查点(代码改进)

时间:2019-10-21 09:21:05

标签: python pandas geolocation geopandas

问题:我想改善代码,以在城市的两个区域之间获得{strong>启动和终止ID汽车。

我有:

  1. `.csv文件包含如下所示的城市区域:

    borders =
    
    zone longitude latitude multi
    12   3.5248    22.0952  MULTIPOLYGON(((3.4991688909 22.1096707778,3.4992650150 22.1094740107, ... ,3.4992922409 22.1094203597,3.4995744041 22.1087939694,3.4997139945 22.1081206986)))
    14   3.5139    22.111   MULTIPOLYGON(((12.4991688909 22.1096707778,3.4992650150 22.1094740107, ... ,32.4992922409 22.1094203597,3.4995744041 32.1087939694,3.4997139945 22.1081206986)))
    ...
    800  3.5273    22.1019  MULTIPOLYGON(((4.4991688909 15.1096707778,3.4992650150 22.1094740107, ... ,4.4992922409 75.1094203597,3.4995744041 22.1087939694,3.4997139945 22.1081206986)))
    

因此,我要检查我的

  1. .csv文件,其中包含出租车数据:

    data = 
    
    ID      latitude longitude epoch        day_of_week
    
    e35f6   11.9125  3.7432    8765456787    Sunday
    e35f6   11.9125  3.7432    4567876545    Sunday
    ...
    fhg3g   23.9125  5.7432    2345434554    Sunday
    

因此,我想检查我的车ID是否在zone 12时开始旅行并在zone 14中结束(但我想检查每个区域)

我到目前为止所做的:

  • 我手动转到border文件,选择2行,创建新的csv文件,手动输入多面的地理数据并将t转换为POINT(geopandas)
  • 做同样的事data

然后

  • 选择data中每个ID的最前点和最后点
  • 如果我的第一个点在第一个区域内,而最后一个在第二个区域内,则the之以鼻。
  • 然后合并新的数据帧以检查相交的ID,以查看同一辆车是否在某个区域开始和结束。

但这是非常耗时的过程。寻找改进。 这是我的代码:

df_first = df.drop_duplicates(subset=['id_easy'], keep='first') # removed duplicates
df_last = df.drop_duplicates(subset=['id_easy'], keep='last') # removed duplicates

crs = {'init':'epsg:4326'}
geometry_first = [Point(xy) for xy in zip(df_first.longitude,df_first.latitude)]
df_first = gpd.GeoDataFrame(df_first,crs=crs,geometry=geometry_first)

geometry_last = [Point(xy) for xy in zip(df_last.longitude,df_last.latitude)]
df_last = gpd.GeoDataFrame(df_last,crs=crs,geometry=geometry_last)

border_1 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/zone1.csv")

geometry_1 = [Point(xy) for xy in zip(border_1.longitude,border_1.latitude)]
border_1 = gpd.GeoDataFrame(border_1,crs=crs,geometry=geometry_1)

border_2 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/zone2.csv")

geometry_2 = [Point(xy) for xy in zip(border_2.longitude,border_2.latitude)]
border_2 = gpd.GeoDataFrame(border_2,crs=crs,geometry=geometry_2)

turin_final_1 = Polygon([[p.x, p.y] for p in border_1.geometry])
first = df_first[df_first.geometry.within(turin_final_1)]

turin_final_2 = Polygon([[p.x, p.y] for p in border_2.geometry])
last = df_last[df_last.geometry.within(turin_final_2)]
first.epoch = pd.to_datetime(first.epoch,unit = 's')

first.index = pd.to_datetime(first.epoch)
last.index = pd.to_datetime(last.epoch)

first1 = first.between_time('0:00', '1:00')
last1 = last.between_time('0:00', '1:00') #till to 24

first1.to_csv(r'D:\anaconda path\PTV\1) Data preparation\between zones\df1\Saturday1_first1.csv',index=False)
last1.to_csv(r'D:\anaconda path\PTV\1) Data preparation\between zones\df2\Saturday1_last1.csv',index=False) #till to 24

os.chdir("D:/anaconda path/PTV/1) Data preparation/between zones/df1")

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

df1 = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
df1.to_csv( "df1.csv", index=False, encoding='utf-8-sig')

os.chdir("D:/anaconda path/PTV/1) Data preparation/between zones/df2")

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

df2 = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
df2.to_csv( "df2.csv", index=False, encoding='utf-8-sig')

df1 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/df1/df1.csv")
df2 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/df2/df2.csv")

df3 = (pd.concat((df1[df1.id_easy.isin(df2.id_easy)],
            df2[df2.id_easy.isin(df1.id_easy)]),
           ignore_index=True)
    .sort_values('id_easy'))

1 个答案:

答案 0 :(得分:0)

如果我理解正确,您想将区域编号分配给每辆车的起点和终点。由于在边框数据中具有关于城市区域(多列)形状的多边形信息,因此建议您使用此信息在起点/终点进行空间连接,以查看汽车开始/结束旅程的区域。这是我的详细建议:

我假设您已经读取了borderdata中的数据,并且其格式正确(即{{1}的multi列中的每个单元格}包含border)。我还假设您的区域是不相交的,即没有重叠。

Shapley.Multipolygon需要一个几何列,因为它只能将具有此名称的列识别为几何信息:

GeoDataFrame

现在,我们还为border['geometry'] = border['multi'] 中的汽车数据中给出的点生成了几何信息:

df

完成后,现在提取起点和终点:

df['geometry'] = df[['longitude', 'latitude']].apply(lambda x: Point(x[0], x[1]), axis=1)

现在我们可以进行空间连接以获得所需的每个起点和终点的区域:

df_first = df.drop_duplicates(subset=['id_easy'], keep='first')
df_last = df.drop_duplicates(subset=['id_easy'], keep='last')

就是这样。现在,您在df_first = gpd.sjoin(df_first, shp.loc[:, ['geometry', 'zone']], how='left', op='within') df_last = gpd.sjoin(df_last, shp.loc[:, ['geometry', 'zone']], how='left', op='within') 列中具有每个点的区域信息。