There is a function corr
in pandas to create a table with mutual correlation coefficients in presence of sparse data. But how to calculate the number of mutual occurrences in the data instead of correlation coefficient?
i.e.
A = [NaN, NaN, 3]
B = [NaN, NaN, 8]
F(A,B) = 1
A = [1, NaN, NaN]
B = [NaN, NaN, 8]
F(A,B) = 0
I need pandas.DataFrame([A,B]).<function>()
-> matrix of occurrences
答案 0 :(得分:0)
In pandas, you may want to use dropna: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
You can do something like
co_occur = df.dropna(how = "any")
the_count = co_occur.shape[0] # number of remaining rows
This will drop all rows where there is any NaN (thereby leaving you only with rows that contain values for every variable) and then count the number of remaining rows.
Alternatively, you could do it with lists (as in your code above) assuming the lists are the same length:
A = [NaN, NaN, 3]
B = [NaN, NaN, 8]
co_occur = len( [i for i in range(len(A)) if A[i] and B[i]] )
答案 1 :(得分:0)
I am using numpy
sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[335]: 1
For you second case
sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[337]: 0
答案 2 :(得分:0)
使用pandas
(df.A.notnull() & df.B.notnull()).sum()
或者
df.notnull().all(axis=1).sum()