我正在使用spark将csv文件转换为镶木地板格式。我正在使用以下代码。
17/02/01 04:54:13 WARN TaskSetManager: Lost task 49.0 in stage 0.0 (TID 49, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/spark/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main
process()
File "/usr/spark/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 510, in prepare
File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1350, in _verify_type
_verify_type(v, f.dataType, f.nullable)
File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1322, in _verify_type
raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, type(obj)))
TypeError: LongType can not accept object u'4168630192959457162' in type <type 'unicode'>
执行暂停并出现以下错误。
8154738304329264826,"http://0.0.0.0/admin/events/event/
3118660108275961803,"http://127.0.0.1/browser/header/
9223372036854775807,"http://127.0.0.1/account/login
5950027385047304809,"http://127.0.0.1/dashboard/
809124421170478235,"http://127.0.0.1/events/
号码&#39; 4168630192959457162&#39;应该是64位整数。这是来自users2.txt的样本集
TabBar.TintColor = UIColor.White; // changer as per your need for tab icon's color
TabBar.BarTintColor = UIColor.Black; // changer as per your need for tabbar's backgroungcolor
我是新来的火花。我在这做错了什么?
答案 0 :(得分:0)
您需要将字符串编码的数字转换为long:
def parse(line):
items = line.split(",")
return (long(items[0]), items[1])
rdd = sc.textFile("hdfs://10.11.21.33:8020/users2.txt").map(parse)
我还告诉sc.textFile不要为你的用例使用unicode:
rdd = sc.textFile("hdfs://10.11.21.33:8020/users2.txt", use_unicode=False) #etc