将lat lon字符串转换为geojson多边形

时间:2020-09-01 03:25:01

标签: python geojson geopandas

我有傻瓜。数据框:

col_a   col_b   col_c   lat lon polyline                                                            
0   2.2 3/27/2017 17:45 -34.92967678    -62.34831333    [{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]"      
0   3.3 3/27/2017 17:45 -34.92967678    -62.34831333    [{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]"      

我想将其转换为geopandas数据框(带有来自折线的几何信息),但是折线列不是标准格式。该如何解决?

3 个答案:

答案 0 :(得分:4)

IIUC,如果原始数据帧是Pandas数据帧,则可以尝试使用Series.str.translate删除所有双引号,并使用Series.str.findall将所有经纬度对检索到元组列表中,然后分配坐标以创建多边形(注意,我们使用map(float,x)将经/纬度从str转换为float):

import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon

df['coords'] = df.polyline \
    .str.translate(str.maketrans({'"':''})) \
    .str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')

geometry = [ Polygon([(float(x), float(y)) for x,y in e]) for e in df['coords'] ]

gdf = gpd.GeoDataFrame(df.drop(['coords','polyline'], axis=1), geometry=geometry)

编辑::如果pandas.Series.str下的方法不可用,则可以使用Python re模块执行相同的操作,例如:(假定原始数据框是一个名为gdf的地理数据框)< / p>

import re
ptn = re.compile(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon(tuple(map(float,x)) for x in re.findall(ptn, x.replace('"',''))) for e in gdf["polyline"] ]
gdf_new = gpd.GeoDataFrame(gdf, geometry=geometry)

答案 1 :(得分:1)

由于GeoPandas支持字符串操作,因此如果数据已经在GeoDataFrame中,则@jxc建议的代码也可以使用。

这是一段代码,用于重新创建GeoDataFrame

from io import StringIO #Python 3 
import pandas as pd 
import geopandas as gpd 

df_string="""0;2.2;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]"       0;3.3;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]" """

df_io = StringIO(df_string)
df = pd.read_csv(df_io, sep=";", names=["col_a","col_b","col_c","lat","lon","polyline"])
gdf = gpd.GeoDataFrame(df)

结果

gdf
    col_a   col_b   col_c   lat lon polyline
0   0   2.2 3/27/2017 17:45 -34.92967678    -62.34831333    "[{lat"":-34.92967677667683   lng:-62.34831333160395} {""lat"":-34.93002861969753   lng:-62.360866069793644}    {""lat"":-34.93526211379422   lng:-62.36063016609785} {""lat"":-34.93571078689853   lng:-62.35996507775451} {""lat"":-34.935798629937075  lng:-62.34816312789911} {""lat"":-34.9333358703344    lng:-62.34824895858759} {""lat"":-34.9320340961022    lng:-62.348334789276066}]""      "
1   0   3.3 3/27/2017 17:45 -34.92967678    -62.34831333    "[{lat"":-34.92967677667683   lng:-62.34831333160395} {""lat"":-34.93002861969753   lng:-62.360866069793644}    {""lat"":-34.93526211379422   lng:-62.36063016609785} {""lat"":-34.93571078689853   lng:-62.35996507775451} {""lat"":-34.935798629937075  lng:-62.34816312789911} {""lat"":-34.9333358703344    lng:-62.34824895858759} {""lat"":-34.9320340961022    lng:-62.348334789276066}]"""

然后,如果几何是 polyline 列名所建议的直线,则应使用Shapely LineString方法而不是Polygon

from shapely.geometry import LineString
coords = gdf.polyline \
    .str.translate(str.maketrans({'"':''})) \
    .str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')

gdf.geometry = [ LineString([(float(x), float(y)) for x,y in e]) for e in coords ]

由于两个几何形状相同,我们可以绘制第一个:

gdf[0:1].plot()

enter image description here

答案 2 :(得分:0)

我为你做了。

import json
lat_lon_str = '''[{"lat": -32.436756736154024, "lng": -62.17932943721189},
               {"lat": -32.445847463649905, "lng": -62.18160395045652},
               {"lat": -32.44686151186612, "lng": -62.176711601213356},
               {"lat": -32.44721472434227, "lng": -62.17625005841933},
               {"lat": -32.44387381345414, "lng": -62.17003797011375},
               {"lat": -32.44158302782885, "lng": -62.16614345534663},
               {"lat": -32.43979915340108, "lng": -62.16164831538572}]'''

lat_lon_json = json.loads(lat_lon_str)
coords = ["POINT({} {})".format(round(line['lat'], 2), round(line['lng'], 2)) for line in lat_lon_json]
print(coords)

结果:

enter image description here 经过测试后,如果结果是您想要的,请告诉我。