根据数据帧的分段(循环)进行计算

时间:2019-05-13 05:36:20

标签: python pandas loops dataframe

2个数据帧。 1短1长。我想使用相关系数将长整数分解为几块,与短整数进行比较。

分割很好。但是,将它们进行计算时,它将返回Nan。

import pandas as pd

data_a = {'ID': ["a1","a2","a3","a4","a5","a6","a7","a8","a9","a10","a11","a12","a13","a14","a15"], 
'Unit_Weight': [178,153,193,195,214,157,205,212,219,166,217,186,170,207,204]}

df_a = pd.DataFrame(data_a)

data_b = {'ID': ["b1","b2","b3","b4","b5"], 
'Unit_Weight': [128,123,123,125,204]}

df_b = pd.DataFrame(data_b)

size = 5      # 5 rows in the long data-frame
list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)]

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].corr(df_b['Unit_Weight'])

输出:

0.6797202605786716
nan
nan

出了什么问题,如何纠正?谢谢。

p.s .:这些是手动计算的结果:

0.6797202605786716
-0.5501914564062937
0.2653370297540246

   ID  Unit_Weight
0  a1          178
1  a2          153
2  a3          193
3  a4          195
4  a5          214
    ID  Unit_Weight
5   a6          157
6   a7          205
7   a8          212
8   a9          219
9  a10          166
     ID  Unit_Weight
10  a11          217
11  a12          186
12  a13          170
13  a14          207
14  a15          204

3 个答案:

答案 0 :(得分:1)

两个Series中必须有相同的索引,因此将DataFrame.reset_indexdrop=True一起使用:

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].reset_index(drop=True).corr(df_b['Unit_Weight'])
    print (corr_e)

0.6797202605786716
-0.5501914564062937
0.26533702975402457

答案 1 :(得分:1)

@jezrael有一个很好的答案,但是另一种方法是更改​​:

list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)]

收件人:

list_of_df_a = [df_a.loc[i:i+size-1,:].reset_index(drop=True) for i in range(0, len(df_a),size)]

现在您的结果将是:

0.6797202605786716
-0.5501914564062937
0.26533702975402457

答案 2 :(得分:0)

您还可以使用numpy.corrcoef自动解决索引问题:

for each in list_of_df_a:
    corr_e = np.corrcoef(each['Unit_Weight'], df_b['Unit_Weight'])[0,1]
    print(corr_e)

0.6797202605786716
-0.5501914564062937
0.2653370297540246