PySpark,GraphFrames,异常引起:java.lang.ClassNotFoundException:com.typesafe.scalalogging.slf4j.LazyLogging

时间:2016-09-21 23:18:12

标签: pyspark graphframes

我正在尝试运行以下使用graphframes的代码,现在我收到一个错误,据我所知,经过几个小时的谷歌搜索,我无法解决。这似乎是一个班级无法加载,但我真的不知道我应该做什么。

有人可以再看看下面的代码和错误吗?我已按照here的说明进行操作,如果您想快速尝试一下,可以找到我的数据集here

"""
Program:    RUNNING GRAPH ANALYTICS WITH SPARK GRAPH-FRAMES:
Author:     Dr. C. Hadjinikolis
Date:       14/09/2016
Description:    This is the application's core module from where everything is executed.
                The module is responsible for:
                1. Loading Spark
                2. Loading GraphFrames
                3. Running analytics by leveraging other modules in the package.
"""
# IMPORT OTHER LIBS -------------------------------------------------------------------------------#
import os
import sys
import pandas as pd

# IMPORT SPARK ------------------------------------------------------------------------------------#
# Path to Spark source folder
USER_FILE_PATH = "/Users/christoshadjinikolis"
SPARK_PATH = "/PycharmProjects/GenesAssociation"
SPARK_FILE = "/spark-2.0.0-bin-hadoop2.7"
SPARK_HOME = USER_FILE_PATH + SPARK_PATH + SPARK_FILE
os.environ['SPARK_HOME'] = SPARK_HOME

# Append pySpark to Python Path
sys.path.append(SPARK_HOME + "/python")
sys.path.append(SPARK_HOME + "/python" + "/lib/py4j-0.10.1-src.zip")

try:
    from pyspark import SparkContext
    from pyspark import SparkConf
    from pyspark.sql import SQLContext
    from graphframes import *

except ImportError as ex:
    print "Can not import Spark Modules", ex
    sys.exit(1)

# GLOBAL VARIABLES --------------------------------------------------------------------------------#
# Configure spark properties
CONF = (SparkConf()
        .setMaster("local")
        .setAppName("My app")
        .set("spark.executor.memory", "10g")
        .set("spark.executor.instances", "4"))

# Instantiate SparkContext object
SC = SparkContext(conf=CONF)

# Instantiate SQL_SparkContext object
SQL_CONTEXT = SQLContext(SC)

# MAIN CODE ---------------------------------------------------------------------------------------#
if __name__ == "__main__":

    # Main Path to CSV files
    DATA_PATH = '/PycharmProjects/GenesAssociation/data/'
    FILE_NAME = 'gene_gene_associations_50k.csv'

    # LOAD DATA CSV USING  PANDAS -----------------------------------------------------------------#
    print "STEP 1: Loading Gene Nodes -------------------------------------------------------------"
    # Read csv file and load as df
    GENES = pd.read_csv(USER_FILE_PATH + DATA_PATH + FILE_NAME,
                        usecols=['OFFICIAL_SYMBOL_A'],
                        low_memory=True,
                        iterator=True,
                        chunksize=1000)

    # Concatenate chunks into list & convert to dataFrame
    GENES_DF = pd.DataFrame(pd.concat(list(GENES), ignore_index=True))

    # Remove duplicates
    GENES_DF_CLEAN = GENES_DF.drop_duplicates(keep='first')

    # Name Columns
    GENES_DF_CLEAN.columns = ['id']

    # Output dataFrame
    print GENES_DF_CLEAN

    # Create vertices
    VERTICES = SQL_CONTEXT.createDataFrame(GENES_DF_CLEAN)

    # Show some vertices
    print VERTICES.take(5)

    print "STEP 2: Loading Gene Edges -------------------------------------------------------------"
    # Read csv file and load as df
    EDGES = pd.read_csv(USER_FILE_PATH + DATA_PATH + FILE_NAME,
                        usecols=['OFFICIAL_SYMBOL_A', 'OFFICIAL_SYMBOL_B', 'EXPERIMENTAL_SYSTEM'],
                        low_memory=True,
                        iterator=True,
                        chunksize=1000)

    # Concatenate chunks into list & convert to dataFrame
    EDGES_DF = pd.DataFrame(pd.concat(list(EDGES), ignore_index=True))

    # Name Columns
    EDGES_DF.columns = ["src", "dst", "rel_type"]

    # Output dataFrame
    print EDGES_DF

    # Create vertices
    EDGES = SQL_CONTEXT.createDataFrame(EDGES_DF)

    # Show some edges
    print EDGES.take(5)

    print "STEP 3: Generating the Graph -----------------------------------------------------------"

    GENES_GRAPH = GraphFrame(VERTICES, EDGES)

    print "STEP 4: Running Various Basic Analytics ------------------------------------------------"
    print "Vertex in-Degree -----------------------------------------------------------------------"
    GENES_GRAPH.inDegrees.sort('inDegree', ascending=False).show()
    print "Vertex out-Degree ----------------------------------------------------------------------"
    GENES_GRAPH.outDegrees.sort('outDegree', ascending=False).show()
    print "Vertex degree --------------------------------------------------------------------------"
    GENES_GRAPH.degrees.sort('degree', ascending=False).show()
    print "Triangle Count -------------------------------------------------------------------------"
    RESULTS = GENES_GRAPH.triangleCount()
    RESULTS.select("id", "count").show()
    print "Label Propagation ----------------------------------------------------------------------"
    GENES_GRAPH.labelPropagation(maxIter=10).show()     # Convergence is not guaranteed
    print "PageRank -------------------------------------------------------------------------------"
    GENES_GRAPH.pageRank(resetProbability=0.15, tol=0.01)\
        .vertices.sort('pagerank', ascending=False).show()

    print "STEP 5: Find Shortest Paths w.r.t. Landmarks -------------------------------------------"
    # Shortest paths
    SHORTEST_PATH = GENES_GRAPH.shortestPaths(landmarks=["ARF3", "MAP2K4"])
    SHORTEST_PATH.select("id", "distances").show()

    print "STEP 6: Save Vertices and Edges --------------------------------------------------------"
    # Save vertices and edges as Parquet to some location.
    # Note: You can't overwrite existing vertices and edges directories.
    GENES_GRAPH.vertices.write.parquet("vertices")
    GENES_GRAPH.edges.write.parquet("edges")

    print "STEP 7: Load "
    # Load the vertices and edges back.
    SAME_VERTICES = GENES_GRAPH.read.parquet("vertices")
    SAME_EDGES = GENES_GRAPH.read.parquet("edges")

    # Create an identical GraphFrame.
    SAME_GENES_GRAPH = GF.GraphFrame(SAME_VERTICES, SAME_EDGES)

# END OF FILE -------------------------------------------------------------------------------------#

这是输出:

Ivy Default Cache set to: /Users/username/.ivy2/cache
The jars for the packages stored in: /Users/username/.ivy2/jars
:: loading settings :: url = jar:file:/Users/username/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found graphframes#graphframes;0.2.0-spark2.0-s_2.11 in list
    found com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 in list
    found com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 in list
    found org.scala-lang#scala-reflect;2.11.0 in list
    [2.11.0] org.scala-lang#scala-reflect;2.11.0
    found org.slf4j#slf4j-api;1.7.7 in list
:: resolution report :: resolve 391ms :: artifacts dl 14ms
    :: modules in use:
    com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 from list in [default]
    com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 from list in [default]
    graphframes#graphframes;0.2.0-spark2.0-s_2.11 from list in [default]
    org.scala-lang#scala-reflect;2.11.0 from list in [default]
    org.slf4j#slf4j-api;1.7.7 from list in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   5   |   0   |   0   |   0   ||   5   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 5 already retrieved (0kB/11ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/09/20 11:00:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
OK1
Traceback (most recent call last):
  File "/Users/username/PycharmProjects/GenesAssociation/main.py", line 32, in <module>
    g = GraphFrame(v, e)
  File "/Users/tjhunter/work/graphframes/python/graphframes/graphframe.py", line 62, in __init__
  File "/Users/tjhunter/work/graphframes/python/graphframes/graphframe.py", line 34, in _java_api
  File "/Users/christoshadjinikolis/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/Users/christoshadjinikolis/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Users/username/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/lib" \
       "/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o53.newInstance.
: java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.graphframes.GraphFrame$.<init>(GraphFrame.scala:677)
    at org.graphframes.GraphFrame$.<clinit>(GraphFrame.scala)
    at org.graphframes.GraphFramePythonAPI.<init>(GraphFramePythonAPI.scala:11)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:211)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.typesafe.scalalogging.slf4j.LazyLogging
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 43 more


Process finished with exit code 1

1 个答案:

答案 0 :(得分:4)

我遇到了与spark / scala相同的问题我通过将类似的jar添加到类路径来解决它:

spark-shell --jars scala-logging_2.12-3.5.0.jar

你可以在这里找到jar: https://mvnrepository.com/artifact/com.typesafe.scala-logging/scala-logging_2.12/3.5.0

来源:https://github.com/graphframes/graphframes/issues/113