Python Pandas:如何读取由标题分隔的分组数据?

时间:2016-05-04 17:43:31

标签: python pandas

我在文本文件中有日期:

post

我想阅读AL012015下的每个块,如下所示:

AL012015,               Kevin,     20,
20151108, 1800,  , XY, 22.2A,  71.5B,  30, 10,
20151108, 1800,  , XY, 22.2A,  71.5B,  30, 10,
20151108, 1800,  , ZZ, 22.2A,  71.5B,  30, 10,
AL022015,               Mike,     20,
20151108, 1800,  , XX, 22.2A,  71.5B,  30, 10,
20151108, 1800,  , YY, 22.2A,  71.5B,  30, 10,

请注意,01和02是AL

之后的两位数

1 个答案:

答案 0 :(得分:1)

我认为你可以应用预处理。使用awk获取包含附加列中数字的新文件,如下所示:

$ awk -F, '/^AL/ {AL=substr($1,3,2);next}{print AL","$0}' file.txt
01,20151108, 1800,  , XY, 22.2A,  71.5B,  30, 10,
01,20151108, 1800,  , XY, 22.2A,  71.5B,  30, 10,
01,20151108, 1800,  , ZZ, 22.2A,  71.5B,  30, 10,
02,20151108, 1800,  , XX, 22.2A,  71.5B,  30, 10,
02,20151108, 1800,  , YY, 22.2A,  71.5B,  30, 10,

然后,您可以使pandas更适合groupby操作。我们假设前一个输出在file2.txt上,您可以这样做:

import pandas as pd
df = pd.read_csv("file2.txt",sep=",",header=None)
for gr,data in df.groupby(0):print(gr,"\n",data)
1 
   0         1     2   3    4       5        6   7   8   9
0  1  20151108  1800       XY   22.2A    71.5B  30  10 NaN
1  1  20151108  1800       XY   22.2A    71.5B  30  10 NaN
2  1  20151108  1800       ZZ   22.2A    71.5B  30  10 NaN
2 
   0         1     2   3    4       5        6   7   8   9
3  2  20151108  1800       XX   22.2A    71.5B  30  10 NaN
4  2  20151108  1800       YY   22.2A    71.5B  30  10 NaN

我希望这可以帮到你。

问候。