Spark DataFrame:仅当至少一列不为空时才计算行?

时间:2017-03-06 10:26:49

标签: apache-spark dataframe apache-spark-sql bigdata

将以下数据视为数据框。如果我计算数据框,那么它给出6。我想只计算在东列不为空的行?

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist

X = pd.DataFrame(np.random.rand(10,5))
model= KMeans(n_clusters=3)
clusassign = model.fit_predict(X.as_matrix())
min_dist = np.min(cdist(X.as_matrix(), model.cluster_centers_, 'euclidean'), axis=1)
Y = pd.DataFrame(min_dist, index=X.index, columns=['Center_euclidean_dist'])
Z = pd.DataFrame(clusassign, index=X.index, columns=['cluster_ID'])
PAP = pd.concat([Y,Z], axis=1)
grouped = PAP.groupby(['cluster_ID'])
grouped.idxmin()

2 个答案:

答案 0 :(得分:0)

如何:/* if click BUTTON 2 AND SIZE ACTIVED CLASS INCH, diagram ul. Create element <li> */ <ul class="diagram"> <li id="foot">PQR</li> <li id="hair">VWX</li> </ul>

答案 1 :(得分:0)

如下所示将给出结果: -

select count(*) from 
(select COALESCE(*) as x from table_name) tmp 
 where x is not null