我如何为空格分隔的列创建数据框。 数据类型
yyyy mm tmax tmin af rain sun
1853 1 --- --- --- 57.3 ---
1853 2 --- --- --- 32.3 ---
1853 3 --- --- --- 65.5 ---
1853 4 --- --- --- 46.2 ---
1853 5 --- --- --- 13.2 ---
1853 6 --- --- --- 53.3 ---
1853 7 --- --- --- 78.0 ---
1853 8 --- --- --- 56.6 ---
1853 9 --- --- --- 24.5 ---
1853 10 --- --- --- 94.8 ---
1853 11 --- --- --- 75.5 ---
答案 0 :(得分:3)
由于您已将pyspark
标记为标签(而不是pandas
),因此可以尝试执行以下操作:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Space Import Test').getOrCreate()
df = spark.read.csv('/path/to/your/file',inferSchema=True,header=True,sep=' ',ignoreLeadingWhiteSpace=True)
df.show(10)
答案 1 :(得分:0)
您可以使用pandas
并将delim_whitespace
参数添加到True
delim_whitespace:布尔值,默认为False
指定是否将空格(例如''或'\ t')用作分隔符。等效于设置sep ='\ s +'。如果这个选项是 设置为True,则分隔符参数不应该传入任何内容。 来源:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
您的情况:
import pandas
pandas.read_csv("data.txt", delim_whitespace=True)
答案 2 :(得分:0)
import pandas as pd
data = pd.read_csv('text.txt', sep=" ") ## Sep is space as it your .txt file it is separated by space
data = data.dropna(axis=1, how='all') ## Since you have space before 1st column, we have to drop NA's created by space