我编写了spark程序,该程序读取CSV文件并将结果写入控制台。运行时出现错误。我正在使用Spark 2.2.0。
示例文件:
Cursor musicCursor = getContentResolver().query(MediaStore.Audio.Media.EXTERNAL_CONTENT_URI, null,
MediaStore.Images.Media.DATA + " like ? ",
//Here in the query you add the desired folder for example iron_maiden folder.
new String[] {"%iron_maiden%"},
null);
程序:
EmployeeID,FirstName,LastName,DepartmentId,Salaray
1,Gowdhaman,Dhandapani,IT,10000
2,Shaara,Gowdhaman,IT,150000
3,Karthiga,Gowdhaman,IT,120000
4,Aravind,Gunasekaran,Mech,100000
5,Padma,Dhandapani,Home,10000
错误:
from pyspark.sql import SparkSession
def read_csv(spark, filename):
df = spark.read.load(filename, format='.csv', sep=',', header = 'true')
return df
def main():
spark = SparkSession \
.builder \
.appName('Python Spark SQL Basic example') \
.getOrCreate()
emp = read_csv(spark, 'Employee.csv')
emp.show()
if __name__ == '__main__':
main()
答案 0 :(得分:2)
您不需要格式的点
format='csv'
这也应该起作用
spark.read\
.option("header", "true")\
.csv("some_input_file.csv")