我试图从目录中读取头文件和相应的csv
,而将csv写入pyspark dataframe
会引发错误
ParseException:u“ \下一个输入'/',期望{'SELECT','FROM', 'ADD',IDENTIFIER,BACKQUOTED_IDENTIFIER}(第1行,位置0)\ n \ n == SQL == \ n / persistent / 4G / filtered_week1 / cell_res \ n ^^^ \ n“
我的代码附在这里:
请让我知道我在这里做错什么了吗
from pyspark.sql import SparkSession
conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '8g'),
('spark.executor.cores', '2'),
('spark.cores.max', '10'),
('spark.driver.memory','8g')])
spark = SparkSession.builder\
.config(conf=conf)\
.appName("Mergecsv mbnl")\
.getOrCreate()
indir="/persistent/4G/filtered_week1/"
#outfile = "/persistent/4G/week1_15Feb/Untitled Folder"
csv_seperator=','
header= glob.glob(indir + "/*.header")
all_files = glob.glob(indir+ "/*.csv")
for filename in all_files:
for head in header:
x= head.split('.sql')[0]
if x in filename:
df1=spark.read.format("csv").option("header", "false").schema(x).load(filename)
df1.take(1)`