将数组值分配给 NaN Dataframe Pandas

时间:2021-01-07 07:49:20

标签: python-3.x pandas dataframe nan

我正在尝试使用从数组中获取的相同数量的值填充最初具有 NaN 值的数据框。字典 leagueList(NFL、NBA 等)中的所有值都是单独的数据帧。 抱歉,我不能把它们放在这里,因为帖子会变得太长。

下面循环背后的想法是获取数据框中所有联赛之间的一系列配对 t 检验 (p_value),并根据名为 'win_loss_ratio' 的列对它们进行比较。 应使用与空数据帧中具有相同数量值的结果数组来替换数据帧中的 NaN 值,但我被困在这部分上。这是如何实现的?

leagueList={'NFL':NFL,'NBA':NBA,'NHL':NHL,'MLB':MLB}

df = pd.DataFrame(columns = leagueList, index = leagueList)

print(df)
     NFL  NBA  NHL  MLB
NFL  NaN  NaN  NaN  NaN
NBA  NaN  NaN  NaN  NaN
NHL  NaN  NaN  NaN  NaN
MLB  NaN  NaN  NaN  NaN


#Double loop for making all possible league combinations

for a in leagueList.values():
        for b in leagueList.values():

            df_comb=pd.merge(a,b,left_index=True,right_index=True,how='inner')
            
            teststat,p_value=stats.ttest_rel(df_comb[['win_loss_ratio_x']],df_comb[['win_loss_ratio_y']])

print(p_value)

[nan]
[0.94179205]
[0.03088317]
[0.80206949]
[0.94179205]
[nan]
[0.02229705]
[0.95053998]
[0.03088317]
[0.02229705]
[nan]
[0.00070784]
[0.80206949]
[0.95053998]
[0.00070784]
[nan]

1 个答案:

答案 0 :(得分:0)

将 p 值放入列表以使用 .fillna,或者直接构造它:

import pandas as pd
from scipy import stats

#some sample data
NFL = pd.DataFrame([.5,.6,.7], columns=['win_loss_ratio'])
NBA = pd.DataFrame([.7,.5,.3], columns=['win_loss_ratio'])
NHL = pd.DataFrame([.4,.3,.2], columns=['win_loss_ratio'])
MLB = pd.DataFrame([.9,.8,.9], columns=['win_loss_ratio'])

leagueList={'NFL':NFL,'NBA':NBA,'NHL':NHL,'MLB':MLB}


#Double loop for making all possible league combinations
rows = []
for a in leagueList.values():
        for b in leagueList.values():

            df_comb=pd.merge(a,b,left_index=True,right_index=True,how='inner')
            
            teststat,p_value=stats.ttest_rel(df_comb[['win_loss_ratio_x']],df_comb[['win_loss_ratio_y']])
            rows.append(p_value[0])

n=len(leagueList)
data = [rows[i * n:(i + 1) * n] for i in range((len(rows) + n - 1) // n )]

df = pd.DataFrame(data, columns = leagueList, index = leagueList)

输出:

print (df.to_string())
          NFL       NBA      NHL       MLB
NFL       NaN  0.622036  0.12169  0.057191
NBA  0.622036       NaN  0.07418  0.092735
NHL  0.121690  0.074180      NaN  0.013560
MLB  0.057191  0.092735  0.01356       NaN