我想将.data文件中的数据转换为.csv文件,并将.data文件中的数据放在其下带值的列中。但是,.data文件具有特定的格式,我不知道如何将文本放在列中。以下是.data文件的外观:
column1
column2
column3
column4
column5
column6
column7
column8
column9
column10
column11
column12
column13
........
column36
1243;6543;5754;5678;4567;4567;4567;2573;7532;6332;6432;6542;5542;7883;7643;4684;4568;4573
3567;5533;6532;6432;7643;8635;7654;6543;8753;7643;7543;7543;7543;6543;6444;7543;6444;6444
1243;6543;5754;5678;4567;4567;4567;2573;7532;6332;6432;6542;5542;7883;7643;4684;4568;4573
3567;5533;6532;6432;7643;8635;7654;6543;8753;7643;7543;7543;7543;6543;6444;7543;6444;6444
1243;6543;5754;5678;4567;4567;4567;2573;7532;6332;6432;6542;5542;7883;7643;4684;4568;4573
3567;5533;6532;6432;7643;8635;7654;6543;8753;7643;7543;7543;7543;6543;6444;7543;6444;6444
1243;6543;5754;5678;4567;4567;4567;2573;7532;6332;6432;6542;5542;7883;7643;4684;4568;4573
3567;5533;6532;6432;7643;8635;7654;6543;8753;7643;7543;7543;7543;6543;6444;7543;6444;6444
如上所示的文件具有36列的名称,每列在1行上。在这些下面是许多数据点,其中36个值由分号分隔。数据点长2行,每个数据点用空行分隔。 .csv文件必须如下所示:
column1,column2,column3,column4,column5,column6,column7,column8,column9,column10,column11,column12,column14,column15,column16,column17,column18,column20,column20,column21,column22,column23,column24,column25,column26,column27,column28,column29,column30,column31,column32,column33,column34,column35,column36
1243,6543,5754,5678,4567,4567,4567,2573,7532,6332,6432,6542,5542,7883,7643,4684,4568,4573,3567,5533,6532,6432,7643,8635,7654,6543,8753,7643,7543,7543,7543,6543,6444,7543,6444,6444
1243,6543,5754,5678,4567,4567,4567,2573,7532,6332,6432,6542,5542,7883,7643,4684,4568,4573,3567,5533,6532,6432,7643,8635,7654,6543,8753,7643,7543,7543,7543,6543,6444,7543,6444,6444
1243,6543,5754,5678,4567,4567,4567,2573,7532,6332,6432,6542,5542,7883,7643,4684,4568,4573,3567,5533,6532,6432,7643,8635,7654,6543,8753,7643,7543,7543,7543,6543,6444,7543,6444,6444
1243,6543,5754,5678,4567,4567,4567,2573,7532,6332,6432,6542,5542,7883,7643,4684,4568,4573,3567,5533,6532,6432,7643,8635,7654,6543,8753,7643,7543,7543,7543,6543,6444,7543,6444,6444
如上所示.csv的第一行必须包含36列,其中的名称以逗号分隔。下一行必须包含所有数据点,每个数据点在1行,其中36个值必须用逗号分隔。
你能为此使用软件库'pandas'吗?无论如何,这是我的开始代码:
with open("file.data") as fIn, open("file.csv", "w") as fOut:
for r, line in enumerate(fIn):
if not line:
break
由于
答案 0 :(得分:3)
当然,你可以用熊猫做到这一点。您只需要读取第一行N
行(在您的情况下为36行)将它们用作标题并读取文件的其余部分,就像普通的csv(pandas擅长)。然后,您可以将pandas.DataFrame
对象保存到csv。
由于您的数据被拆分为相邻的行,我们应该将我们读取的DataFrame拆分为两个并将它们堆叠在另一个旁边(水平)。
请考虑以下代码:
import pandas as pd
COLUMNS_COUNT = 36
# read first `COLUMNS_COUNT` lines to serve as a header
with open('data.data', 'r') as f:
columns = [next(f).strip() for line in range(COLUMNS_COUNT)]
# read rest of the file to temporary DataFrame
temp_df = pd.read_csv('data.data', skiprows=COLUMNS_COUNT, header=None, delimiter=';', skip_blank_lines=True)
# split temp DataFrame on even and odd rows
even_df = temp_df.iloc[::2].reset_index(drop=True)
odd_df = temp_df.iloc[1::2].reset_index(drop=True)
# stack even and odd DataFrames horizontaly
df = pd.concat([even_df, odd_df], axis=1)
# assign column names
df.columns = columns
# save result DataFrame to csv
df.to_csv('out.csv', index=False)
更新UPD:代码以正确处理分成两行的数据