如何将两个csv文件导入同一个数据帧(文件目录不同)

时间:2018-02-28 08:13:01

标签: csv apache-spark pyspark apache-spark-sql

我想从两个具有相同架构的csv文件创建数据框但文件夹路径与另一个不同

2 个答案:

答案 0 :(得分:3)

在Spark 2.x中:

  • 存储在不同目录

    中的CSV文件中的单个DataFrame

    val df = spark.read.option("header", "true").option("inferSchema", "true").csv(path1,path2)

Dataframe from multiple file paths

  • CSV文件中的单个数据帧以递归方式存储在目录中 (使用通配符)

    val df = spark.read.option("header", "true").option("inferSchema", "true").csv(parent-directory/\*/*)

Dataframe from recursive directories

答案 1 :(得分:0)

您可以在使用> Uploading: http://localhost:8080/manager/text/deploy?path=%2Fhelloworld 120/23235 KB Feb 28, 2018 1:11:42 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset Feb 28, 2018 1:11:42 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: Retrying request Uploading: http://localhost:8080/manager/text/deploy?path=%2Fhelloworld 2/23235 KB Feb 28, 2018 1:11:42 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: I/O exception (java.net.SocketException) caught when processing request: Broken pipe (Write failed) Feb 28, 2018 1:11:42 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: Retrying request Uploading: http://localhost:8080/manager/text/deploy?path=%2Fhelloworld 2/23235 KB Feb 28, 2018 1:11:42 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: I/O exception (java.net.SocketException) caught when processing request: Broken pipe (Write failed) Feb 28, 2018 1:11:42 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: Retrying request Uploading: http://localhost:8080/manager/text/deploy?path=%2Fhelloworld 2/23235 KB [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 10.061 s [INFO] Finished at: 2018-02-28T13:11:42+05:00 [INFO] Final Memory: 15M/60M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.tomcat.maven:tomcat7-maven-plugin:2.0:deploy (default-cli) on project springmvc-helloworld: Cannot invoke Tomcat manager: Broken pipe (Write failed) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

阅读csv文件时提供list of string paths
sqlContext

或使用sqlContext.read.format("com.databricks.spark.csv").csv(["path1", "path2"]).show(truncate=False)

load

您可以使用其他选项作为sqlContext.read.format("com.databricks.spark.csv").load(["path1", "path2"]).show(truncate=False) header等...