无法将本地文件加载到PySpark Dataframe

时间:2018-10-02 10:16:48

标签: apache-spark pyspark local-files

我是MacOS用户,我刚刚下载了Apache Spark。然后,将其放入/usr/local/spark中。 这是我的.bash_profile内部的内容:

export SPARK_HOME="/usr/local/spark"
export PYSPARK_PYTHON=python3
export PATH=$PATH:$SPARK_HOME/bin
#export PYSPARK_DRIVER_PYTHON="jupyter"
#export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

问题是,当键入pyspark进入pyspark shell时,然后键入以下两行:

spark = SparkSession.builder.appName("preprocessing").config("spark-master", "local").getOrCreate()
df = spark.read.format("csv").option("header","true").option("inferSchema", "true").option("delimiter",",").load("src/census-income.data")

发生错误:

2018-10-02 19:55:24 ERROR PoolWatchThread:118 - Error in trying to obtain a connection. Retrying in 7000ms
java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
    at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
    at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
    at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
    at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
    at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
    at org.apache.derby.impl.jdbc.EmbedConnection.setReadOnly(Unknown Source)
    at com.jolbox.bonecp.ConnectionHandle.setReadOnly(ConnectionHandle.java:1324)
    at com.jolbox.bonecp.ConnectionHandle.<init>(ConnectionHandle.java:262)
    at com.jolbox.bonecp.PoolWatchThread.fillConnections(PoolWatchThread.java:115)
    at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:82)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: ERROR 25505: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
    at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
    at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
    at org.apache.derby.impl.sql.conn.GenericAuthorizer.setReadOnlyConnection(Unknown Source)
    at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.setReadOnly(Unknown Source)
    ... 8 more
  • 火花版本:2.3.2
  • Python版本:3.7.0

2 个答案:

答案 0 :(得分:1)

您可以尝试从当前目录(SPARK_HOME)删除文件 metastore_db / dbex.lck 吗?

来源:https://github.com/bpn1/ingestion/wiki/Troubleshooting

答案 1 :(得分:0)

Spark正在尝试从HDFS加载。显然您没有安装hadoop,并且spark无法连接到HDFS。 如果要从加载中加载,则必须明确指定:

<form [formGroup]="addressForm"> <div class="row"> <div class="form-group col-md-10"> <label for="txtVia">Via</label> <input type="text" pInputText class="form-control" id="txtVia" formControlName="via"> </div> <div class="form-group col-md-2"> <label for="txtCivico">Civico</label> <input type="text" pInputText class="form-control" id="txtCivico" formControlName="civico"> </div> </div> <div class="row"> <div class="form-group col-md-3"> <label for="txtCap">Cap</label> <input type="text" pInputText class="form-control" id="txtCap" formControlName="cap"> </div> <div class="form-group col-md-6"> <label for="txtComune">Comune</label> <input type="text" pInputText class="form-control" id="txtComune" formControlName="comune"> </div> <div class="form-group col-md-3"> <label for="txtProvincia">Provincia</label> <input type="text" pInputText class="form-control" id="txtProvincia" formControlName="provincia"> </div> </div> <div class="form-group"> <label for="txtFrazione">Frazione</label> <input type="text" pInputText class="form-control" id="txtFrazione" formControlName="frazione"> </div> <div class="row"> <div class="form-group col-md-6"> <label for="txtRegione">Regione</label> <input type="text" pInputText class="form-control" id="txtRegione" formControlName="regione"> </div> <div class="form-group col-md-6"> <label for="txtStato">Stato</label> <input type="text" pInputText class="form-control" id="txtStato" formControlName="stato"> </div> </div> </form>