假设我运行了一个python shell (file1.py),它将文本文件作为参数。我按以下方式运行它:
python file1.py textfile1.txt
在file1.py中包含以下代码
from pyspark import SparkContext
....
#I can read the file using the follwoing command
sc = SparkContext()
inputfile= sc.textFile(sys.argv[1])
我必须做些什么来使file1.py运行没有问题?
但是pyspark不适用于我,通常,我使用spark-submit!所以当在本地模式下使用spark-submit运行时,它会给我以下错误
Traceback (most recent call last):
File "/home/noorhadoop/Desktop/folder1/file1.py", line 4, in <module>
from pyspark import SparkContext
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 44, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/context.py", line 33, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 25, in <module>
File "/usr/lib/python3.6/platform.py", line 909, in <module>
"system node release version machine processor")
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 21, in <module>
from urllib.request import urlopen
File "/usr/lib/python3.6/urllib/request.py", line 88, in <module>
import http.client
File "/usr/lib/python3.6/http/client.py", line 71, in <module>
import email.parser
File "/usr/lib/python3.6/email/parser.py", line 12, in <module>
from email.feedparser import FeedParser, BytesFeedParser
File "/usr/lib/python3.6/email/feedparser.py", line 27, in <module>
from email._policybase import compat32
File "/usr/lib/python3.6/email/_policybase.py", line 9, in <module>
from email.utils import _has_surrogates
File "/usr/lib/python3.6/email/utils.py", line 31, in <module>
import urllib.parse
File "/usr/lib/python3.6/urllib/parse.py", line 227, in <module>
_DefragResultBase = namedtuple('DefragResult', 'url fragment')
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
Original exception was:
Traceback (most recent call last):
File "/home/noorhadoop/Desktop/folder1/file1.py", line 4, in <module>
from pyspark import SparkContext
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 44, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/context.py", line 33, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 25, in <module>
File "/usr/lib/python3.6/platform.py", line 909, in <module>
"system node release version machine processor")
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
hduser@noorhadoop-virtual-machine:/usr/local/spark$ ./bin/spark-submit --master local[3] /home/noorhadoop/Desktop/folder1/file1.py /home/noorhadoop/Desktop/folder1/simple1.txt
Traceback (most recent call last):
File "/home/noorhadoop/Desktop/folder1/file1.py", line 4, in <module>
from pyspark import SparkContext
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 44, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/context.py", line 33, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 25, in <module>
File "/usr/lib/python3.6/platform.py", line 909, in <module>
"system node release version machine processor")
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 21, in <module>
from urllib.request import urlopen
File "/usr/lib/python3.6/urllib/request.py", line 88, in <module>
import http.client
File "/usr/lib/python3.6/http/client.py", line 71, in <module>
import email.parser
File "/usr/lib/python3.6/email/parser.py", line 12, in <module>
from email.feedparser import FeedParser, BytesFeedParser
File "/usr/lib/python3.6/email/feedparser.py", line 27, in <module>
from email._policybase import compat32
File "/usr/lib/python3.6/email/_policybase.py", line 9, in <module>
from email.utils import _has_surrogates
File "/usr/lib/python3.6/email/utils.py", line 31, in <module>
import urllib.parse
File "/usr/lib/python3.6/urllib/parse.py", line 227, in <module>
_DefragResultBase = namedtuple('DefragResult', 'url fragment')
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
Original exception was:
Traceback (most recent call last):
File "/home/noorhadoop/Desktop/folder1/file1.py", line 4, in <module>
from pyspark import SparkContext
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 44, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/context.py", line 33, in <module>
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 25, in <module>
File "/usr/lib/python3.6/platform.py", line 909, in <module>
"system node release version machine processor")
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
谢谢,
答案 0 :(得分:1)
您没有发布错误消息,因此很难确切知道,但sc.textFile
期望HDFS或本地文件系统上的文件的完整路径。
例如,如果您在本地模式下运行spark,则必须使用spark-submit传递参数as -
spark-submit \
--master local[*] \
--/path/to/file1.py \
"file://path/to/textfile1.txt"
或者如果您在群集上运行,请将完整的hdfs路径作为参数
spark-submit \
--master spark://localhost:7077 \
--/path/to/file1.py \
"hdfs://localhost:9000/path/to/textfile1.txt"