我有一个基本上是列表列表的数据集
data = [[(datetime.datetime(2018, 12, 6, 10, 0), Decimal('7.0000000000000000')), (datetime.datetime(2018, 12, 6, 11, 0), Decimal('2.0000000000000000')), (datetime.datetime(2018, 12, 6, 12, 0), Decimal('43.6666666666666667')), (datetime.datetime(2018, 12, 6, 14, 0), Decimal('8.0000000000000000')), (datetime.datetime(2018, 12, 7, 9, 0), Decimal('12.0000000000000000')), (datetime.datetime(2018, 12, 7, 10, 0), Decimal('2.0000000000000000')), (datetime.datetime(2018, 12, 7, 11, 0), Decimal('2.0000000000000000')), (datetime.datetime(2018, 12, 7, 17, 0), Decimal('2.0000000000000000'))], [(datetime.datetime(2018, 12, 6, 10, 0), 28.5), (datetime.datetime(2018, 12, 6, 11, 0), 12.75), (datetime.datetime(2018, 12, 6, 12, 0), 12.15), (datetime.datetime(2018, 12, 6, 14, 0), 12.75), (datetime.datetime(2018, 12, 7, 9, 0), 12.75), (datetime.datetime(2018, 12, 7, 10, 0), 12.75), (datetime.datetime(2018, 12, 7, 11, 0), 12.75), (datetime.datetime(2018, 12, 7, 17, 0), 12.75)]]
它基本上包含两个列表,每个列表都有一个date
和metric
列。我需要提取每个列表的指标列值,并找到它们之间的相关关系。
注意:每个列表中的日期都相似
所以首先我将每个列表加载到熊猫中并设置日期索引。
data1 = data[0]
data2 = data[1]
df1 = pd.DataFrame(data1)
df1[0] = pd.to_datetime(df1[0], errors='coerce')
df1.set_index(0, inplace=True)
df2 = pd.DataFrame(data2)
df2[0] = pd.to_datetime(df2[0], errors='coerce')
df2.set_index(0, inplace=True)
现在,我合并两个数据框(它们都共享相同的日期)。
df = pd.merge(df1,df2, how='inner', left_index=True, right_index=True)
现在我的数据框看起来像这样
1_x 1_y
0
2018-12-06 10:00:00 7.0000000000000000 28.50
2018-12-06 11:00:00 2.0000000000000000 12.75
2018-12-06 12:00:00 43.6666666666666667 12.15
2018-12-06 14:00:00 8.0000000000000000 12.75
2018-12-07 09:00:00 12.0000000000000000 12.75
2018-12-07 10:00:00 2.0000000000000000 12.75
2018-12-07 11:00:00 2.0000000000000000 12.75
2018-12-07 17:00:00 2.0000000000000000 12.75
现在,我需要找到两个指标列(1_x和1_y)之间的Pearson和Spearman系数
这样做我就能找到皮尔逊系数
pearson_coeff = df.iloc[:,0].astype('float64').corr(df.iloc[:,1].astype('float64'))
但是为了找到Spearman系数,我这样做了
spearman_coeff = df.iloc[:,0].astype('float64').corr(method="spearman", min_periods=1).df.iloc[-1]
但是我收到下面的错误
Traceback (most recent call last):
File "/home/souvik/Music/UI_Server2/test61.py", line 85, in <module>
print(df.iloc[:,0].astype('float64').corr(method="spearman", min_periods=1).df.iloc[-1])
TypeError: corr() missing 1 required positional argument: 'other'
我关注了stackoverflow上的这篇文章 TypeError: corr() missing 1 required positional argument: 'other'并按照说明进行操作,但仍然出现此错误。
我在做什么错了?
答案 0 :(得分:2)
您可以使用与Pearson
相同的语法:
spearman_coeff = df.iloc[:,0].astype('float64').corr(df.iloc[:,1].astype('float64'),method="spearman", min_periods=1)
或更简单地说,因为您的值已经浮动并且默认情况下min_periods
是1
:
# pearson_coeff = df['1_x'].corr(df['1_y'])
spearman_coeff = df['1_x'].corr(df['1_y'], method='spearman')
输出:
>>> spearman_coeff
-0.34874291623145787