对csv文件进行性能分析时遇到错误
我正在使用databricks库读取csv,而性能分析低于错误
report=spark_df_profiling_optimus.ProfileReport(spark_df)
File "/root/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/__init__.py", line 19, in __init__
description_set = describe(df, bins=bins, corr_reject=corr_reject, **kwargs)
File "/root/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 440, in describe
ldesc = {colum: describe_1d(df, colum, table_stats["n"]) for colum in df.columns}
File "/root/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 440, in <dictcomp>
ldesc = {colum: describe_1d(df, colum, table_stats["n"]) for colum in df.columns}
File "/root/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 406, in describe_1d
result = result.append(describe_integer_1d(df, column, result, nrows))
File "/root/anaconda2/lib/python2.7/site-packages/spark_df_profiling_optimus-0.1.1-py2.7.egg/spark_df_profiling_optimus/base.py", line 209, in describe_integer_1d
.format(col=column, n=x)).toPandas().ix[:,0]
File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 780, in selectExpr
jdf = self._jdf.selectExpr(self._jseq(expr))
File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
self.target_id, self.name)
File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 40, in deco
raise AnalysisException(s.split(': ', 1)[1])
AnalysisException: undefined function percentile;
请帮助我解决问题