pandas simple pairwise occurrence

时间:2018-06-04 16:58:57

标签: python pandas

There is a function corr in pandas to create a table with mutual correlation coefficients in presence of sparse data. But how to calculate the number of mutual occurrences in the data instead of correlation coefficient?

i.e.

A = [NaN, NaN, 3]

B = [NaN, NaN, 8]

F(A,B) = 1

A = [1, NaN, NaN]

B = [NaN, NaN, 8]

F(A,B) = 0

I need pandas.DataFrame([A,B]).<function>() -> matrix of occurrences

3 个答案:

答案 0 :(得分:0)

In pandas, you may want to use dropna: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

You can do something like

co_occur = df.dropna(how = "any")
the_count = co_occur.shape[0] # number of remaining rows

This will drop all rows where there is any NaN (thereby leaving you only with rows that contain values for every variable) and then count the number of remaining rows.

Alternatively, you could do it with lists (as in your code above) assuming the lists are the same length:

A = [NaN, NaN, 3]
B = [NaN, NaN, 8]

co_occur = len( [i for i in range(len(A)) if A[i] and B[i]] )

答案 1 :(得分:0)

I am using numpy

sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[335]: 1

For you second case

sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[337]: 0

答案 2 :(得分:0)

使用pandas

(df.A.notnull() & df.B.notnull()).sum()

或者

df.notnull().all(axis=1).sum()