使用spark_df_profiling_optimus进行分析时遇到以下问题
report=spark_df_profiling_optimus.ProfileReport(spark_df)
得到以下错误:
report=spark_df_profiling_optimus.ProfileReport(spark_df)
File "/home/dmp_admin/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/__init__.py", line 19, in __init__
description_set = describe(df, bins=bins, corr_reject=corr_reject, **kwargs)
File "/home/dmp_admin/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 440, in describe
ldesc = {column: describe_1d(df, column, table_stats["n"]) for column in df.columns}
File "/home/dmp_admin/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 440, in <dictcomp>
ldesc = {column: describe_1d(df, column, table_stats["n"]) for column in df.columns}
File "/home/dmp_admin/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 406, in describe_1d
result = result.append(describe_integer_1d(df, column, result, nrows))
File "/home/dmp_admin/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 209, in describe_integer_1d
.format(col=column, n=x)).toPandas().ix[:,0]
File "/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 876, in selectExpr
jdf = self._jdf.selectExpr(self._jseq(expr))
File "/usr/hdp/2.5.0.0-1245/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u'undefined function percentile;'