Question

There is a function corr in pandas to create a table with mutual correlation coefficients in presence of sparse data. But how to calculate the number of mutual occurrences in the data instead of correlation coefficient?

i.e.

A = [NaN, NaN, 3]

B = [NaN, NaN, 8]

F(A,B) = 1

A = [1, NaN, NaN]

B = [NaN, NaN, 8]

F(A,B) = 0

I need pandas.DataFrame([A,B]).<function>() -> matrix of occurrences

Answer 1

In pandas, you may want to use dropna: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

You can do something like

co_occur = df.dropna(how = "any")
the_count = co_occur.shape[0] # number of remaining rows

This will drop all rows where there is any NaN (thereby leaving you only with rows that contain values for every variable) and then count the number of remaining rows.

Alternatively, you could do it with lists (as in your code above) assuming the lists are the same length:

A = [NaN, NaN, 3]
B = [NaN, NaN, 8]

co_occur = len( [i for i in range(len(A)) if A[i] and B[i]] )

Answer 2

I am using numpy

sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[335]: 1

For you second case

sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[337]: 0

Answer 3

使用pandas

(df.A.notnull() & df.B.notnull()).sum()

或者

df.notnull().all(axis=1).sum()

pandas simple pairwise occurrence

3 个答案: