这是我的代码:
from pyspark import SparkContext,SparkConf
sc=SparkContext("local","Sampl_text")
try:
rdd1 = sc.textFile("/home/yati/Hadoop/Practice/sample.txt")
#print(rdd1)
def rmheader(x):
x.replace('=', ',')
return x
def column(x):
pass
test = rdd1.flatMap(lambda x:rmheader(x).split(',')[3::])
repl = test.reduceByKey(lambda a, b:a+b)
print (repl.collect())
except Exception as ex:
print(ex)
sc.stop()
------------------- X ------------------------ X-- ------------------------- X -----
错误消息:
/usr/bin/python2.7 /home/yati/Hadoop/Practice/MapReduce_Pra.py
2018-08-28 16:15:02 WARN Utils:66 - Your hostname, ubuntu resolves to a loopback address: 127.0.1.1, but we couldn't find any external IP address!
2018-08-28 16:15:02 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-08-28 16:15:04 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[Stage 0:> (0 + 1) / 1]2018-08-28 16:15:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/yati/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 229, in main
process()
File "/home/yati/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 224, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/home/yati/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
File "/home/yati/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
File "/home/yati/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 362, in func
File "/home/yati/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1857, in combineLocally
File "/home/yati/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/shuffle.py", line 236, in mergeValues
for k, v in iterator:
**ValueError: too many values to unpack**