我有一个巨大的原始数据集(每个文本文件4k行),其中有很多竖线和空格。
|group call| pvt call |phone call|group busy| pvt busy |phone busy|
time |total |total |total |total |total |total | %
period| sec cnt | sec cnt| sec cnt| sec cnt| sec cnt| sec cnt | usage
00:00 | 4323 548| 0 0| 0 0| 0 0| 0 0| 0 0| 18%
00:15 | 4125 479| 0 0| 0 0| 0 0| 0 0| 0 0| 17%
00:30 | 3071 395| 0 0| 0 0| 0 0| 0 0| 0 0| 13%
00:45 | 3514 447| 0 0| 0 0| 0 0| 0 0| 0 0| 14%
01:00 | 3081 383| 0 0| 0 0| 0 0| 0 0| 0 0| 13%
我想将其转换为csv文件。 我使用python和pandas构建的解析器仅读取csv值。我该怎么办? csv文件应类似于:
time_pd,group_call_t_s,group_call_t_c,pvt_call_t_sec,pvt_call_t_c,phone_call_t_sec,phone_call_t_c,group_busy_t_sec,group_busy_t_c,pvt_busy_t_sec, pvt_busy_t_c,phone_busy_t_sec, phone_busy_t_c, per_usage
00:00,4323,548,0,0,0,0,0,0,0,0,0,0,18%
00:15,4125,479,0,0,0,0,0,0,0,0,0,0,17%
00:30,3071,395,0,0,0,0,0,0,0,0,0,0,13%
00:45,3514,447,0,0,0,0,0,0,0,0,0,0,14%
01:00,3081,383,0,0,0,0,0,0,0,0,0,0,13%
01:15,4017,470,0,0,0,0,0,0,0,0,0,0,18%
01:30,4767,555,0,0,0,0,0,0,0,0,0,0,18%
答案 0 :(得分:0)
Python
如果所有文件都具有相同的标题结构,则可以读取数据部分,分配标题,然后保存为CSV:
data = pd.read_csv("file1.txt", sep=r'\s*\|?\s*', header=None, skiprows=3)
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13
#0 00:00 4323 548 0 0 0 0 0 0 0 0 0 0 18%
#1 00:15 4125 479 0 0 0 0 0 0 0 0 0 0 17%
#2 00:30 3071 395 0 0 0 0 0 0 0 0 0 0 13%
#3 00:45 3514 447 0 0 0 0 0 0 0 0 0 0 14%
#4 01:00 3081 383 0 0 0 0 0 0 0 0 0 0 13%
data.columns = "time_pd","group_call_t_s","group_call_t_c",...
data.to_csv("file1.csv", index=None)