我在我的系统中安装了apache spark和apache Livy。当我运行python代码时,它给出了错误
'u'java.lang.IllegalStateException: Session is in state starting''
默认情况下,Apchy Livy在端口号8998上运行。我的python代码是
import json, pprint, requests, textwrap
host = 'http://localhost:8998'
data = {'kind': 'pyspark'}
headers = {'Content-Type': 'application/json'}
r = requests.post(host + '/sessions', data=json.dumps(data),
headers=headers)
session_url = host + r.headers['location']
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""
import random
NUM_SAMPLES = 100000
def sample(p):
x, y = random.random(), random.random()
return 1 if x*x + y*y < 1 else 0
count = sc.parallelize(xrange(0,
NUM_SAMPLES)).map(sample).reduce(lambda a, b: a +b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
""")
}
r = requests.post(statements_url, data=json.dumps(data),
headers=headers)
pprint.pprint(r.json())
{u'id': 12,
u'output': {u'data': {u'text/plain': u'Pi is roughly 3.136000'},
u'execution_count': 12,
u'status': u'ok'},
u'state': u'running'}
我如何解决此错误?
答案 0 :(得分:0)
每当你在Livy中创建一个新的火花会话时,它需要时间才能进入空闲状态。在创建会话之后,您将直接将代码发布到特定的spark会话,该会话由于抛出异常而仍处于启动状态。
尝试做这样的事情 - :
import json, pprint, requests, textwrap
host = 'http://192.168.0.56:8998'
data = {'kind': 'spark'}
headers = {'Content-Type': 'application/json'}
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)
print(headers['location'])
从标题中获取网址后,请按照
进行操作session_url = "http://localhost/"+paste headers value
r = requests.get(session_url, headers=headers)
statements_url = session_url + '/statements'
data = {'code': '1 + 1'}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print(r.json())
statement_url = session_url
r = requests.get(statement_url, headers=headers)
pprint.pprint(r.json())
答案 1 :(得分:0)
创建会话后,您需要检查返回的状态。仅在会话状态变为idle
后才提交语句。下面的代码对我有用。
您可以打印返回的值以调试设置。如果状态继续为starting
或dead
,则说明您的设置存在问题。也许是一些许可问题或其他有关通过livy启动spark上下文的问题。在livy UI stderr中进行检查。
uri = "/sessions"
data = {'kind': 'pyspark'
}
headers = {'Content-Type': 'application/json'}
r = requests.post(url=self.host + uri, data=json.dumps(data),
headers=headers)
response = r.json()
sessionId = response['id']
while response['state'] != "idle":
r = requests.get(self.host + "/sessions/" + str(sessionId), headers=headers)
response = r.json()
# print(r.json())
return response['id']
此返回sessionId现在可以用于提交语句了。您只需致电POST /sessions/{sessionId}/statements
即可提交您的对帐单。