Spark TypeError:LongType无法接受对象u&#39; Value&#39;在类型<type'unicode'=“”>中

时间:2017-02-01 05:00:14

标签: python apache-spark spark-dataframe parquet

我正在使用spark将csv文件转换为镶木地板格式。我正在使用以下代码。

17/02/01 04:54:13 WARN TaskSetManager: Lost task 49.0 in stage 0.0 (TID 49, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/spark/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/usr/spark/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/usr/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 510, in prepare
  File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1350, in _verify_type
    _verify_type(v, f.dataType, f.nullable)
  File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1322, in _verify_type
    raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, type(obj)))
TypeError: LongType can not accept object u'4168630192959457162' in type <type 'unicode'>

执行暂停并出现以下错误。

8154738304329264826,"http://0.0.0.0/admin/events/event/
3118660108275961803,"http://127.0.0.1/browser/header/
9223372036854775807,"http://127.0.0.1/account/login
5950027385047304809,"http://127.0.0.1/dashboard/
809124421170478235,"http://127.0.0.1/events/

号码&#39; 4168630192959457162&#39;应该是64位整数。这是来自users2.txt的样本集

 TabBar.TintColor = UIColor.White; // changer as per your need for tab icon's color
TabBar.BarTintColor = UIColor.Black; // changer as per your need for tabbar's backgroungcolor 

我是新来的火花。我在这做错了什么?

1 个答案:

答案 0 :(得分:0)

您需要将字符串编码的数字转换为long:

def parse(line):
    items = line.split(",")
    return (long(items[0]), items[1])

rdd = sc.textFile("hdfs://10.11.21.33:8020/users2.txt").map(parse)

我还告诉sc.textFile不要为你的用例使用unicode:

rdd = sc.textFile("hdfs://10.11.21.33:8020/users2.txt", use_unicode=False) #etc