调用o227.run

时间:2018-01-10 19:49:24

标签: windows apache-spark pyspark spark-dataframe graphframes

我比较新兴,我试图创建一个graphframe并对此进行一些查询,这是我的代码

import pyspark
from pyspark.sql import SQLContext
from graphframe import *
sc = pyspark.SparkContext()
sqlContext = SQLContext(sc)
vertices = sqlContext.createDataFrame([
("1","Alex", 28, "M","MIPT"),
("2","Emeli", 28, "F","MIPT"),
("7","Ilya", 29, "M","MSU")], ["id","name","age","gender","university"])
edges = sqlContext.createDataFrame([
("1","2","friend")
], ["src", "dst" , "type"])
g=GraphFrame(vertices,edges)
result = g.connectedComponents()

但结果显示以下错误:

  

追踪(最近一次通话):     文件“”,第1行,in     文件“C:\ Users \ ALI_PC \ AppData \ Local \ Temp \ spark-73d7bc01-3873-4423-ac2b-527e39608ece \ userFiles-b2dd0ea9-9556-4bea-9931-915608bad9b0 \ graphframes_graphframes-0.5.0-spark2.1-s_2 .11.jar \ graphframes \ graphframe.py“,第279行,在connectedComponents中     文件“C:\ Spark \ spark-2.2.1-bin-hadoop2.7 \ python \ lib \ py4j-0.10.4-src.zip \ py4j \ java_gateway.py”,第1133行,调用     文件“C:\ Spark \ spark-2.2.1-bin-hadoop2.7 \ python \ pyspark \ sql \ utils.py”,第63行,装饰       返回f(* a,** kw)     get_return_value中的文件“C:\ Spark \ spark-2.2.1-bin-hadoop2.7 \ python \ lib \ py4j-0.10.4-src.zip \ py4j \ protocol.py”,第319行   py4j.protocol.Py4JJavaError:调用o249.run时发生错误。   :java.io.IOException:未设置检查点目录。请先使用sc.setCheckpointDir()进行设置。           at org.graphframes.lib.ConnectedComponents $$ anonfun $ 2.apply(ConnectedComponents.scala:280)           at org.graphframes.lib.ConnectedComponents $$ anonfun $ 2.apply(ConnectedComponents.scala:280)           在scala.Option.getOrElse(Option.scala:121)           at org.graphframes.lib.ConnectedComponents $ .org $ graphframes $ lib $ ConnectedComponents $$ run(ConnectedComponents.scala:279)           在org.graphframes.lib.ConnectedComponents.run(ConnectedComponents.scala:139)           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)           at java.lang.reflect.Method.invoke(Method.java:498)           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)           在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)           在py4j.Gateway.invoke(Gateway.java:280)           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)           在py4j.commands.CallCommand.execute(CallCommand.java:79)           在py4j.GatewayConnection.run(GatewayConnection.java:214)           在java.lang.Thread.run(Thread.java:745)

我该如何解决这个问题,谢谢!

1 个答案:

答案 0 :(得分:0)

正如异常消息中所述:

  

未设置检查点目录。请先使用sc.setCheckpointDir()进行设置。

你必须设置检查点目录:

sc.setCheckpointDir(path_to_checkpoint_directory)