向Stackoverflow社区致以问候,
我正在尝试读取包含1370行和两列的.csv文件:Time
和Speed
。
Time Speed
0 1
1 4
2 7
3 8
我希望找到两个时间步长Speed
之间的差异(例如Time
2
和1
,3
)整个长度的数据。我想添加一个新列dS
,其中包含先前计算的差异。数据现在看起来像:
Time Speed dS
0 1 NaN
1 4 3
2 7 3
3 8 1
我使用的代码如下:
import pandas as pd
from pandas import read_csv
df2 = pd.read_csv ('speed.csv')
dVV = []
for i, row in df2.iterrows():
dVV.append(df2.iloc[i+1,1] - df2.iloc[i,1])
break
df2['dVV']=dVV
我得到的错误是:
ValueError Traceback (most recent call last)
<ipython-input-29-4ed9fde37ff9> in <module>()
14 break
15
---> 16 df2['dVV']=dVV
17
18 #df2.to_csv('udds_test.csv', index=False, header=True)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key,
value)
2517 else:
2518 # set column
-> 2519 self._set_item(key, value)
2520
2521 def _setitem_slice(self, key, value):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key,
value)
2583
2584 self._ensure_valid_index(value)
-> 2585 value = self._sanitize_column(key, value)
2586 NDFrame._set_item(self, key, value)
2587
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
2758
2759 # turn me into an ndarray
-> 2760 value = _sanitize_index(value, self.index, copy=False)
2761 if not isinstance(value, (np.ndarray, Index)):
2762 if isinstance(value, list) and len(value) > 0:
~\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_index(data,
index, copy)
3119
3120 if len(data) != len(index):
-> 3121 raise ValueError('Length of values does not match length of '
'index')
3122
3123 if isinstance(data, PeriodIndex):
ValueError: Length of values does not match length of index
我猜测代码在最后的第1370行之后就破了。我该如何解决这个问题?
答案 0 :(得分:3)
您可以使用pd.Series.diff
:
df['ds'] = df['Speed'].diff()
print(df)
Time Speed ds
0 0 1 NaN
1 1 4 3.0
2 2 7 3.0
3 3 8 1.0
当pd.Series.diff
等矢量化解决方案可用时,不推荐您尝试过的循环方法。
答案 1 :(得分:0)
使用:
df['Speed_avg'] = df['Speed'].rolling(2, min_periods=2).mean()
df['ds'] = df['Speed'].diff()
输出:
Time Speed Speed_avg ds
0 0 1 NaN NaN
1 1 4 2.5 3.0
2 2 7 5.5 3.0
3 3 8 7.5 1.0