pyspark:distinctCount - AnalysisException:u“无法解析给定输入列的'x':

时间:2016-06-24 20:16:25

标签: python apache-spark pyspark spark-dataframe

我有以下数据框:

using UnityEngine;
using System.Collections;
using UnityEngine.UI;
public class ballScript : MonoBehaviour {
    Rigidbody rb;
    public float thrust;
    public Button left;
    public Button right;
    public Button startB;
    // Use this for initialization
    void Start () {
        rb = GetComponent<Rigidbody>();
        gameObject.SetActive(false);

    }

    // Update is called once per frame
    void Update () {
        gameObject.SetActive(true);

    }

    public void StartButton()
    {
        gameObject.SetActive(true);
        left.enabled = true;
        right.enabled = true;
        startB.enabled = false;

    }

    public void forwardSLASHleft()
    {
        rb.AddForce(transform.right * -1 * thrust);
    }

    public void backwardsSLASHright()
    {
        rb.AddForce(transform.right * 1 * thrust);
    }
}

我只想保留ID为Id | field_A | field_B | field_C | field_D 1 | cat | 12 | black | 11 1 | dog | 128 | white | 19 2 | dog | 35 | yellow | 20 2 | dog | 21 | brown | 4 3 | bird | 10 | blue | 7 4 | cow | 99 | brown | 34 的行。 (也就是说,只有“ONE TYPE”动物的Id)。最终结果应该是:

distinctCount(field_A') = 1

我从下面的方法开始:

Id | field_A | field_B | field_C | field_D
 2 |   dog   |  35     |  yellow | 20
 2 |   dog   |  21     |   brown |  4
 3 |  bird   |  10     |    blue |  7
 4 |   cow   |  99     |   brown | 34

然后我收到以下错误:

myDF.groupBy(['Id']).agg(countDistinct('field_A')).alias('distinct_A_count').filter('distinct_A_count = 1').show(20,False)

有谁知道我做错了什么?谢谢!

1 个答案:

答案 0 :(得分:0)

我通过 withColumnRenamed 而不是别名

开始工作
myDF.groupBy(['Id']).agg(countDistinct('field_A')).withColumnRenamed('count(field_A)','distinct_A_count').filter('distinct_A_count = 1').show(20,False)