Apache Spark读取CSV文件-ClassNotFoundException

时间:2018-09-03 13:25:59

标签: python apache-spark pyspark

我编写了spark程序,该程序读取CSV文件并将结果写入控制台。运行时出现错误。我正在使用Spark 2.2.0。

示例文件:

Cursor musicCursor = getContentResolver().query(MediaStore.Audio.Media.EXTERNAL_CONTENT_URI, null, 
                MediaStore.Images.Media.DATA + " like ? ", 
                //Here in the query you add the desired folder for example iron_maiden folder.
                new String[] {"%iron_maiden%"},
                null);

程序:

EmployeeID,FirstName,LastName,DepartmentId,Salaray
1,Gowdhaman,Dhandapani,IT,10000
2,Shaara,Gowdhaman,IT,150000
3,Karthiga,Gowdhaman,IT,120000
4,Aravind,Gunasekaran,Mech,100000
5,Padma,Dhandapani,Home,10000

错误:

from pyspark.sql import SparkSession

def read_csv(spark, filename):
    df = spark.read.load(filename, format='.csv', sep=',', header = 'true')
    return df


def main():
    spark = SparkSession \
        .builder \
        .appName('Python Spark SQL Basic example') \
        .getOrCreate()

    emp = read_csv(spark, 'Employee.csv')
    emp.show()


if __name__ == '__main__':
    main()

1 个答案:

答案 0 :(得分:2)

您不需要格式的点

format='csv'

这也应该起作用

spark.read\
    .option("header", "true")\
    .csv("some_input_file.csv")