假设我有以下pyspark数据帧:
>>> df = spark.createDataFrame([('A', 'Amsterdam', 3.4), ('B', 'London', None), ('C', None, None), ('D', None, 11.1)], ['c1', 'c2', 'c3'])
>>> df.show()
+---+---------+----+
| c1| c2| c3|
+---+---------+----+
| A|Amsterdam| 3.4|
| B| London|null|
| C| null|null|
| D| null|11.1|
+---+---------+----+
我现在如何选择或过滤任何行,包含至少一个空值,如下所示?:
>>> df.SOME-COMMAND-HERE.show()
+---+---------+----+
| c1| c2| c3|
+---+---------+----+
| B| London|null|
| C| null|null|
| D| null|11.1|
+---+---------+----+
答案 0 :(得分:2)
通过删除所需的行,从原始数据框创建一个中间数据框。然后"减去"它来自原文:
# Create the data frame
df = spark.createDataFrame([('A', 'Amsterdam', 3.4), ('B', 'London', None), ('C', None, None), ('D', None, 11.1)], ['c1', 'c2', 'c3'])
df.show()
+---+---------+----+
| c1| c2| c3|
+---+---------+----+
| A|Amsterdam| 3.4|
| B| London|null|
| C| null|null|
| D| null|11.1|
+---+---------+----+
# Construct an intermediate dataframe without the desired rows
df_drop = df.dropna('any')
df_drop.show()
+---+---------+---+
| c1| c2| c3|
+---+---------+---+
| A|Amsterdam|3.4|
+---+---------+---+
# Then subtract it from the original to reveal the desired rows
df.subtract(df_drop).show()
+---+------+----+
| c1| c2| c3|
+---+------+----+
| B|London|null|
| C| null|null|
| D| null|11.1|
+---+------+----+
答案 1 :(得分:0)
构造适当的原始SQL查询并应用:
*(c+k) = (int*)malloc(sizeof(int)*M);