Question

我有一个数据集，其中包括在由纬度和经度表示的特定地理位置上获取的河流水流测量值。数据中有相当大的噪声，但是多次测量会沿河上下移动。我正在尝试计算河上规则网格的平均电流。这是我在做什么

将数据读入数据框，将纬度和经度四舍五入为约10m分辨率（0.0001）度。
生成一个10m的网格（X，Y），覆盖最小到最大的纬度和经度。
为网格上的每个点生成平均电流（Z）。<---这就是问题所在。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import descartes
import geopandas as gpd
from shapely.geometry import Point, Polygon
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline

df = pd.read_csv('./Downloads/current_test/SpdCoach 2182533 20190506 0637AM.csv', skiprows=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,29], na_values='---')
df['GPS Lon.'] = np.around(df['GPS Lon.'].fillna(method='ffill',limit=50),decimals=4)
df['GPS Lat.'] = np.around(df['GPS Lat.'].fillna(method='ffill',limit=50),decimals=4)
# use the difference in latitude and longitude to determine direction of travel
df['del_lon']=df['GPS Lon.'].diff()
df['del_lat']=df['GPS Lat.'].diff()
df['dir']=np.arctan(df['del_lat']/df['del_lon'])
# use difference between impeller and GPS speed to estimate current
df['current']=np.abs(df['Speed (IMP)'] -df['Speed (GPS)'])

# Set the limits of the meshgrid
x = np.arange(min(df['GPS Lon.']),max(df['GPS Lon.']),0.0001)
y = np.arange(min(df['GPS Lat.']),max(df['GPS Lat.']),0.0001)

X, Y = np.meshgrid(x, y)
Z = np.mean(df.current[df['GPS Lon.']==X][df['GPS Lat.']==Y])

我尝试过该声明 np.mean(df['current'][df['GPS Lon.']==-71.248][df['GPS Lat.']==42.362]) 并正确计算了该对纬度/经度对的平均电流。

我希望当我用X和Y替换数值时，该语句将针对网格中的每个点进行评估，并为每个点计算平均值。除非文件中有实际数据，否则大多数值将为零。

我得到以下错误。

~/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
   1743             # as it will broadcast
   1744             if other.ndim != 0 and len(self) != len(other):
-> 1745                 raise ValueError('Lengths must match to compare')
   1746 
   1747             res_values = na_op(self.values, np.asarray(other))

ValueError: Lengths must match to compare

我对使用这种类型的索引没有经验，所以我可能做错了什么。如果有人告诉我这是什么，我将不胜感激。

谢谢

完全追溯：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-9430b088fa85> in <module>
      5 
      6 X, Y = np.meshgrid(x, y)
----> 7 Z = np.mean(df.current[df['GPS Lon.']==X][df['GPS Lat.']==Y])
      8 
      9 cm = plt.cm.get_cmap('RdYlBu')

~/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
   1743             # as it will broadcast
   1744             if other.ndim != 0 and len(self) != len(other):
-> 1745                 raise ValueError('Lengths must match to compare')
   1746 
   1747             res_values = na_op(self.values, np.asarray(other))

ValueError: Lengths must match to compare

数据框由CSV文件定义。数据框中的列名称为：

['Interval',
 'Distance (GPS)',
 'Distance (IMP)',
 'Elapsed Time',
 'Split (GPS)',
 'Speed (GPS)',
 'Split (IMP)',
 'Speed (IMP)',
 'Stroke Rate',
 'Total Strokes',
 'Distance/Stroke (GPS)',
 'Distance/Stroke (IMP)',
 'Heart Rate',
 'Power',
 'Catch',
 'Slip',
 'Finish',
 'Wash',
 'Force Avg',
 'Work',
 'Force Max',
 'Max Force Angle',
 'GPS Lat.',
 'GPS Lon.',
 'del_lon',
 'del_lat',
 'dir',
 'current',
 'geometry']

引发错误的代码行的目的是使用由X，Y定义的meshgrid中定义的Lat，Lon对。 X和Y是numpy数组，其尺寸为282 x231。

我有限的理解是，这种索引风格将对每个X，Y对进行索引，并在数据帧df列“ GPS纬度”中找到匹配的值。和“ GPS Lon”。

在地理坐标处寻找平均值

0 个答案: