Python熊猫Fischer精确测试函数是否采用NaN或空值?

时间:2019-04-11 20:20:51

标签: python pandas nan

我知道这个问题在这里得到了回答: Fisher's exact test on values from large dataframe and bypassing errors 但我猜它在R中而不是python pandas函数中。 我的数据具有NaN值。 我的问题: 如何使用带有NaN或空白值的fischer精确函数

我得到的错误:

 ValueError: The input `table` must be of shape (2, 2).

我的输入数据如下:

   ID   G1  G2  G3  G4  G5  G6
A   1   1   0   0   0   1
B   1   0   1   0   NaN NaN
C   1   0   1   0   0   0
D   1   0   1   0   1   1
E   1   NaN NaN NaN NaN 0
F   1   0   1   0   NaN 0
G   1   1   0   0   NaN 0
H   0   0   0   0   NaN 0
I   0   0   1   0   NaN 0
J   0   0   0   1   NaN 0
K   0   0   1   0   NaN 0
L   1   0   0   0   NaN 1
M   1   0   0   0   NaN 1
N   1   0   1   0   NaN 1
O   1   0   0   0   NaN 0
P   1   1   0   0   NaN 0
Q   1   0   0   0   NaN 0
R   1   0   1   0   NaN 0
S   1   0   1   0   NaN 1
T   1   0   0   0   NaN 0
U   1   0   0   0   NaN 0
V   NaN NaN NaN NaN NaN NaN
W   1   0   0   0   0   0
X   1   0   0   0   0   0
Y   1   1   0   0   NaN 0
Z   1   0   0   0   0   0

以下是我的代码:

import pandas as pd
import os
from scipy.stats import fisher_exact

dirpath="..."
df = pd.read_table("...")
df.set_index("ID", inplace=True)

result = sum(range(len(df.columns), 0, -1))
my_df = pd.DataFrame(index=df.columns, columns=df.columns)

for colout in df.columns:
        for colinner in df.columns:
            if(colout==colinner):
                my_df.at[colout,colinner]=0
            else:
                tab = pd.crosstab(df[colout],df[colinner])
                fish_vals = fisher_exact(tab)
                my_df.at[colout,colinner]=fish_vals[1]

    my_df.to_csv(os.path.join(dirpath,'pvals.txt'), sep='\t', encoding='utf-8',quoting=0, index=True)

0 个答案:

没有答案