我的代码出了什么问题?
idAndNumbers = ((1,(1,2,3)))
irRDD = sc.parallelize(idAndNumbers)
irLengthRDD = irRDD.map(lambda x:x[1].length).collect()
得到一堆错误,如:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 88.0 failed 1 times, most recent failure: Lost task 0.0 in stage 88.0 (TID 88, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
完整追踪:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 88.0 failed 1 times, most recent failure: Lost task 0.0 in stage 88.0 (TID 88, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/pyspark/worker.py", line 101, in main
process()
File "/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/pyspark/worker.py", line 96, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/pyspark/serializers.py", line 236, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "<ipython-input-79-ef1d5a130db5>", line 12, in <lambda>
TypeError: 'int' object has no attribute '__getitem__'
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
事实证明它确实是一个嵌套的元组我正在处理的事情如下:((1,(1,2,3)))
答案 0 :(得分:0)
>>> ian = [(1,(1,2,3))]
>>> p = sc.parallelize(ian)
>>> l = p.map(lambda x: len(x[1]))
>>> print l.collect()
[3]
你需要使用len.Tuple没有任何名为length
的东西答案 1 :(得分:0)
同意ayan guha,你可以输入help(len)来查看以下信息:
//Header guard
#ifndef V2_BURRITO_H //If this header has not already been included in main.cpp
#define V2_BURRITO_H //Then include the following lines of code
class Burrito //Creating a class named 'Burrito'
{
//Creating a public interface
public:
//Creating a 'Constructor', or a way to manipulate 'private' data
Burrito(int a); //This constructor contains 1 input in the form of an integer
//Creating a 'Member function', another name for a function inside a class
void setType(int a);
};
#endif //End of code