Question

我有一个文本文件，其中包含以#开头的多行标题。我如何使用pyspark跳过这些行？

lines.startswith中是否有pyspark？

Answer 1

正如documentation中所述，存在一个参数comment，可以将其设置为#以跳过以此字符开头的行。

实施例，

df = sql.read.csv(path, comment="#", inferSchema=True, header=True)