Pyspark' tzinfo'使用Cassandra连接器时出错

时间:2016-03-17 10:41:13

标签: pyspark spark-cassandra-connector pyspark-sql

我正在使用

从Cassandra读书
a = sc.cassandraTable("my_keyspace", "my_table").select("timestamp", "vaue")

然后想将其转换为数据帧:

a.toDF()

并且架构被正确推理:

DataFrame[timestamp: timestamp, value: double]

但是在实现数据帧时我得到以下错误:

Py4JJavaError: An error occurred while calling o89372.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 285.0 failed 4 times, most recent failure: Lost task 0.3 in stage 285.0 (TID 5243, kepler8.cern.ch): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
    process()
  File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in toInternal
    return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
  File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in <genexpr>
    return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
  File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 435, in toInternal
    return self.dataType.toInternal(obj)
  File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 190, in toInternal
    seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo
AttributeError: 'str' object has no attribute 'tzinfo'

听起来像是string pyspark.sql.types.TimestampType

我怎样才能进一步调试?

0 个答案:

没有答案