我在文本文件中有日期:
post
我想阅读AL012015下的每个块,如下所示:
AL012015, Kevin, 20,
20151108, 1800, , XY, 22.2A, 71.5B, 30, 10,
20151108, 1800, , XY, 22.2A, 71.5B, 30, 10,
20151108, 1800, , ZZ, 22.2A, 71.5B, 30, 10,
AL022015, Mike, 20,
20151108, 1800, , XX, 22.2A, 71.5B, 30, 10,
20151108, 1800, , YY, 22.2A, 71.5B, 30, 10,
请注意,01和02是AL
之后的两位数答案 0 :(得分:1)
我认为你可以应用预处理。使用awk
获取包含附加列中数字的新文件,如下所示:
$ awk -F, '/^AL/ {AL=substr($1,3,2);next}{print AL","$0}' file.txt
01,20151108, 1800, , XY, 22.2A, 71.5B, 30, 10,
01,20151108, 1800, , XY, 22.2A, 71.5B, 30, 10,
01,20151108, 1800, , ZZ, 22.2A, 71.5B, 30, 10,
02,20151108, 1800, , XX, 22.2A, 71.5B, 30, 10,
02,20151108, 1800, , YY, 22.2A, 71.5B, 30, 10,
然后,您可以使pandas
更适合groupby
操作。我们假设前一个输出在file2.txt
上,您可以这样做:
import pandas as pd
df = pd.read_csv("file2.txt",sep=",",header=None)
for gr,data in df.groupby(0):print(gr,"\n",data)
1
0 1 2 3 4 5 6 7 8 9
0 1 20151108 1800 XY 22.2A 71.5B 30 10 NaN
1 1 20151108 1800 XY 22.2A 71.5B 30 10 NaN
2 1 20151108 1800 ZZ 22.2A 71.5B 30 10 NaN
2
0 1 2 3 4 5 6 7 8 9
3 2 20151108 1800 XX 22.2A 71.5B 30 10 NaN
4 2 20151108 1800 YY 22.2A 71.5B 30 10 NaN
我希望这可以帮到你。
问候。