如何在多列上应用条件并在pyspark中显示错误消息

时间:2018-03-27 09:44:54

标签: python pyspark spark-dataframe

我有一个数据框如下。

def register(request):
    if request.method == 'POST':
        form = RegistrationForm(request.POST)
        if form.is_valid():
            form.save()
            return redirect('/')

    else:
        form = RegistrationForm()

    # MAKE SURE WE ALWAYS RETURN A RESPONSE:
    # we end up here when it's a GET request 
    # AND when it's a POST request and the form
    # did not validate   
    args = {'form': form}
    return render(request, 'users/reg_form.html', args)

我想应用一个条件,如果任何列值大于零,则显示一条错误消息,说明此列的计数大于零

+---+---+---+---+
|  x|  y|  z|  w|
+---+---+---+---+
|  0|  4|  4|  4|
+---+---+---+---+

类似地,代码应该显示值大于零的所有列的消息。

请帮助我在python中使用pyspark编写代码,因为我对这个平台很新。

1 个答案:

答案 0 :(得分:0)

您可以遍历每一列,使用您指定的条件进行过滤,并检查生成的DataFrame的大小是否大于0.

from pyspark.sql.functions import col

for c in df.columns:
    if df.where(col(c) > 0).count() > 0:
        print("your count is more than zero for column {col}".format(col=c))

对于您提供的示例,输出将为:

your count is more than zero for column y
your count is more than zero for column z
your count is more than zero for column w