我有傻瓜。数据框:
col_a col_b col_c lat lon polyline
0 2.2 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
0 3.3 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
我想将其转换为geopandas数据框(带有来自折线的几何信息),但是折线列不是标准格式。该如何解决?
答案 0 :(得分:4)
IIUC,如果原始数据帧是Pandas数据帧,则可以尝试使用Series.str.translate删除所有双引号,并使用Series.str.findall将所有经纬度对检索到元组列表中,然后分配坐标以创建多边形(注意,我们使用map(float,x)
将经/纬度从str转换为float):
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
df['coords'] = df.polyline \
.str.translate(str.maketrans({'"':''})) \
.str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon([(float(x), float(y)) for x,y in e]) for e in df['coords'] ]
gdf = gpd.GeoDataFrame(df.drop(['coords','polyline'], axis=1), geometry=geometry)
编辑::如果pandas.Series.str
下的方法不可用,则可以使用Python re模块执行相同的操作,例如:(假定原始数据框是一个名为gdf的地理数据框)< / p>
import re
ptn = re.compile(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon(tuple(map(float,x)) for x in re.findall(ptn, x.replace('"',''))) for e in gdf["polyline"] ]
gdf_new = gpd.GeoDataFrame(gdf, geometry=geometry)
答案 1 :(得分:1)
由于GeoPandas支持字符串操作,因此如果数据已经在GeoDataFrame中,则@jxc
建议的代码也可以使用。
这是一段代码,用于重新创建GeoDataFrame
from io import StringIO #Python 3
import pandas as pd
import geopandas as gpd
df_string="""0;2.2;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]" 0;3.3;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]" """
df_io = StringIO(df_string)
df = pd.read_csv(df_io, sep=";", names=["col_a","col_b","col_c","lat","lon","polyline"])
gdf = gpd.GeoDataFrame(df)
结果
gdf
col_a col_b col_c lat lon polyline
0 0 2.2 3/27/2017 17:45 -34.92967678 -62.34831333 "[{lat"":-34.92967677667683 lng:-62.34831333160395} {""lat"":-34.93002861969753 lng:-62.360866069793644} {""lat"":-34.93526211379422 lng:-62.36063016609785} {""lat"":-34.93571078689853 lng:-62.35996507775451} {""lat"":-34.935798629937075 lng:-62.34816312789911} {""lat"":-34.9333358703344 lng:-62.34824895858759} {""lat"":-34.9320340961022 lng:-62.348334789276066}]"" "
1 0 3.3 3/27/2017 17:45 -34.92967678 -62.34831333 "[{lat"":-34.92967677667683 lng:-62.34831333160395} {""lat"":-34.93002861969753 lng:-62.360866069793644} {""lat"":-34.93526211379422 lng:-62.36063016609785} {""lat"":-34.93571078689853 lng:-62.35996507775451} {""lat"":-34.935798629937075 lng:-62.34816312789911} {""lat"":-34.9333358703344 lng:-62.34824895858759} {""lat"":-34.9320340961022 lng:-62.348334789276066}]"""
然后,如果几何是 polyline 列名所建议的直线,则应使用Shapely LineString
方法而不是Polygon
:
from shapely.geometry import LineString
coords = gdf.polyline \
.str.translate(str.maketrans({'"':''})) \
.str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
gdf.geometry = [ LineString([(float(x), float(y)) for x,y in e]) for e in coords ]
由于两个几何形状相同,我们可以绘制第一个:
gdf[0:1].plot()
答案 2 :(得分:0)
我为你做了。
import json
lat_lon_str = '''[{"lat": -32.436756736154024, "lng": -62.17932943721189},
{"lat": -32.445847463649905, "lng": -62.18160395045652},
{"lat": -32.44686151186612, "lng": -62.176711601213356},
{"lat": -32.44721472434227, "lng": -62.17625005841933},
{"lat": -32.44387381345414, "lng": -62.17003797011375},
{"lat": -32.44158302782885, "lng": -62.16614345534663},
{"lat": -32.43979915340108, "lng": -62.16164831538572}]'''
lat_lon_json = json.loads(lat_lon_str)
coords = ["POINT({} {})".format(round(line['lat'], 2), round(line['lng'], 2)) for line in lat_lon_json]
print(coords)
结果: