我知道这个问题在这里得到了回答: Fisher's exact test on values from large dataframe and bypassing errors 但我猜它在R中而不是python pandas函数中。 我的数据具有NaN值。 我的问题: 如何使用带有NaN或空白值的fischer精确函数
我得到的错误:
ValueError: The input `table` must be of shape (2, 2).
我的输入数据如下:
ID G1 G2 G3 G4 G5 G6
A 1 1 0 0 0 1
B 1 0 1 0 NaN NaN
C 1 0 1 0 0 0
D 1 0 1 0 1 1
E 1 NaN NaN NaN NaN 0
F 1 0 1 0 NaN 0
G 1 1 0 0 NaN 0
H 0 0 0 0 NaN 0
I 0 0 1 0 NaN 0
J 0 0 0 1 NaN 0
K 0 0 1 0 NaN 0
L 1 0 0 0 NaN 1
M 1 0 0 0 NaN 1
N 1 0 1 0 NaN 1
O 1 0 0 0 NaN 0
P 1 1 0 0 NaN 0
Q 1 0 0 0 NaN 0
R 1 0 1 0 NaN 0
S 1 0 1 0 NaN 1
T 1 0 0 0 NaN 0
U 1 0 0 0 NaN 0
V NaN NaN NaN NaN NaN NaN
W 1 0 0 0 0 0
X 1 0 0 0 0 0
Y 1 1 0 0 NaN 0
Z 1 0 0 0 0 0
以下是我的代码:
import pandas as pd
import os
from scipy.stats import fisher_exact
dirpath="..."
df = pd.read_table("...")
df.set_index("ID", inplace=True)
result = sum(range(len(df.columns), 0, -1))
my_df = pd.DataFrame(index=df.columns, columns=df.columns)
for colout in df.columns:
for colinner in df.columns:
if(colout==colinner):
my_df.at[colout,colinner]=0
else:
tab = pd.crosstab(df[colout],df[colinner])
fish_vals = fisher_exact(tab)
my_df.at[colout,colinner]=fish_vals[1]
my_df.to_csv(os.path.join(dirpath,'pvals.txt'), sep='\t', encoding='utf-8',quoting=0, index=True)