我正在测试使用pytest创建RDD的Spark函数。功能如下:
def joinDMS(dms,df):
return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer")
我的测试功能如下:
def test_joinDMS(spark_context, hive_context):
input_rdd_data = [
["0004", 46]
]
input_rdd = spark_context.parallelize(input_rdd_data)
df_input = hive_context.createDataFrame(input_rdd, ['mac', 'ch'])
input_dms_data = [
["0004", "gotv", 46]
]
input_dms = spark_context.parallelize(input_dms_data)
df_dms = hive_context.createDataFrame(input_dms, ['mac', 'tech', 'ch'])
expected_results = [
["0004", "gotv", 46, "0004", 46]
]
input_exp_results = spark_context.parallelize(expected_results)
df_exp_results = hive_context.createDataFrame(input_exp_results, ['mac', 'tech', 'ch', 'mac', 'ch'])
results = tv_functions.joinDMS(df_dms, df_input)
assert results == df_exp_results
数据框results
和df_exp_results
相同,但测试状态为“失败”。这是错误:
---------------------- Captured stderr call -----------------------
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
有谁知道这个问题的原因是什么?我认为它可能是字符串的问题,但不确定。