Question

我正在测试使用pytest创建RDD的Spark函数。功能如下：

def joinDMS(dms,df):
    return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer")

我的测试功能如下：

def test_joinDMS(spark_context, hive_context):
    input_rdd_data = [
        ["0004", 46]
    ]
    input_rdd = spark_context.parallelize(input_rdd_data)
    df_input = hive_context.createDataFrame(input_rdd, ['mac', 'ch'])

    input_dms_data = [
        ["0004", "gotv", 46]
    ]
    input_dms = spark_context.parallelize(input_dms_data)
    df_dms = hive_context.createDataFrame(input_dms, ['mac', 'tech', 'ch'])

    expected_results = [
        ["0004", "gotv", 46, "0004", 46]
    ]
    input_exp_results = spark_context.parallelize(expected_results)
    df_exp_results = hive_context.createDataFrame(input_exp_results, ['mac', 'tech', 'ch', 'mac', 'ch'])

    results = tv_functions.joinDMS(df_dms, df_input)
    assert results == df_exp_results

数据框results和df_exp_results相同，但测试状态为“失败”。这是错误：

---------------------- Captured stderr call -----------------------
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

有谁知道这个问题的原因是什么？我认为它可能是字符串的问题，但不确定。

使用pytest测试Spark时，Unicode等同比较失败

0 个答案: