我正在处理以下代码:
# Resample, interpolate and inspect ozone data here
data = data.resample('D').interpolate()
data.info()
# Create the rolling window
***rolling = data.rolling(360)['Ozone']
# Insert the rolling quantiles to the monthly returns
data['q10'] = rolling.quantile(.1)
data['q50'] = rolling.quantile(.5)
data['q90'] = rolling.quantile(.9)
# Plot the data
data.plot()
plt.show()
对于加星号(***),我想知道,我可以改用以下内容吗?
data['Ozone'].rolling(360)
为什么下面的表达式False
是
data.rolling(360)['Ozone']==data['Ozone'].rolling(360)
它们有什么区别?
答案 0 :(得分:1)
data.rolling(360)['Ozone']
和data['Ozone'].rolling(360)
可以互换使用,但应在使用诸如.mean
之类的聚合方法后进行比较,并且应使用pandas.DataFrame.equal
比较。.rolling
方法需要window
或用于计算的观察值数量。在下面的示例中,window
,10
中的值用NaN
填充。pandas.DataFrame.rolling
pandas.Series.rolling
df.rolling(10)['A'])
和df['A'].rolling(10)
是pandas.core.window.rolling.Rolling
类型,将不会进行比较。
.rolling
的工作方式的更多详细信息,请参见文档和How do pandas Rolling objects work?。import pandas as pd
import numpy as np
# test data and dataframe
np.random.seed(10)
df = pd.DataFrame(np.random.randint(20, size=(20, 1)), columns=['A'])
# this is pandas.DataFrame.rolling with a column selection
df.rolling(10)['A']
[out]:
Rolling [window=10,center=False,axis=0]
# this is pandas.Series.rolling
df['A'].rolling(10)
[out]:
Rolling [window=10,center=False,axis=0]
# see that the type is the same, pandas.core.window.rolling.Rolling
type(df.rolling(10)['A']) == type(df['A'].rolling(10))
[out]:
True
# the two implementations evaluate as False, when compared
df.rolling(10)['A'] == df['A'].rolling(10)
[out]:
False
.mean
,我们可以看到window
使用的值为NaN
。df.rolling(10)['A'].mean()
和df['A'].rolling(10).mean()
均为pandas.core.series.Series
类型,可以比较。df.rolling(10)['A'].mean()
[out]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 12.3
10 12.2
11 12.1
12 12.3
13 11.1
14 12.1
15 12.3
16 12.3
17 12.0
18 11.5
19 11.9
Name: A, dtype: float64
df['A'].rolling(10).mean()
[out]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 12.3
10 12.2
11 12.1
12 12.3
13 11.1
14 12.1
15 12.3
16 12.3
17 12.0
18 11.5
19 11.9
Name: A, dtype: float64
np.nan == np.nan
是False
,因此它们的评估结果不同。从本质上讲,它们是相同的,但是当将两者与==
进行比较时,带有NaN
的行的值为False
。pandas.DataFrame.equals
会将相同位置的NaN视为相等。# row by row evaluation
df.rolling(10)['A'].mean() == df['A'].rolling(10).mean()
[out]:
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 True
19 True
Name: A, dtype: bool
# overall comparison
all(df.rolling(10)['A'].mean() == df['A'].rolling(10).mean())
[out]:
False
# using pandas.DataFrame.equals
df.rolling(10)['A'].mean().equals(df['A'].rolling(10).mean())
[out]:
True