Question

我正在处理数据集，该数据集可为我提供拖拉机（.csv格式）覆盖的路径的GPS坐标（纬度和经度）。我想将字段和路径与数据分开（请参见下图）。

样本数据集：https://drive.google.com/open?id=1rVNbkuJuPmcGUzQI9NhKwYJPgcEeypq3

这是读取csv并将其绘制的代码，

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

path = r"data_stackoverflow.csv" #importing Data
df = pd.read_csv(path) #Read .csv to a pandas dataframe
latitude = df.Latitude.tolist()    #convert the column Latitude to list, latitude
longitude = df.Longitude.tolist()  #convert the column Longitude to list, longitude

coordinates=list(zip(latitude, longitude))

arr = np.array(coordinates) #numpy array of all points
x=arr[:,[0]]
y=arr[:,[1]]

plt.title("GPS Data Visualized")
plt.xlabel("Latitude")
plt.ylabel("Longitude")

plt.plot(x,y)
plt.scatter(x,y)

我的问题

如何将路径与字段分开？有没有特定的算法可以这样做？

我尝试在数据集上实现DBSCAN，但结果并不总是准确的。

我的结果应该是什么

作为结果，我想要一个数据框，该数据框必须仅给我字段数据点。

我的结果图应该看起来像这样（仅限字段）

Sample Result

Answer 1

我认为我们可以将属于字段路径的点视为outliers。

演示：

from sklearn.ensemble import IsolationForest

out = IsolationForest(n_estimators=200, contamination="auto", behaviour="new")

df["x"] = out.fit_predict(df[["Latitude", "Longitude"]])

mask = df["x"] == 1

fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, sharey=True, figsize=(10, 10))

ax1.plot(df["Longitude"], df["Latitude"], linewidth=1)
ax2.plot(df.loc[mask, "Longitude"], df.loc[mask, "Latitude"], linewidth=1)

如何在python中分隔字段和路径gps坐标？

1 个答案: