我有一个数据框:
Histogram DN Npts Total Percent Acc Pct
Band 1 -0.054741 1 1 0.0250 0.0250
Bin=0.00233 -0.052404 0 1 0.0000 0.0250
-0.050067 0 1 0.0000 0.0250
-0.047730 0 1 0.0000 0.0250
-0.045393 0 1 0.0000 0.0250
-0.043056 0 1 0.0000 0.0250
-0.040719 0 1 0.0000 0.0250
Histogram DN Npts Total Percent Acc Pct
Band 2 0.000000 346 346 9.5186 9.5186
Bin=0.00203 0.002038 0 346 0.0000 9.5186
0.004076 0 346 0.0000 9.5186
0.006114 0 346 0.0000 9.5186
0.008152 0 346 0.0000 9.5186
0.010189 0 346 0.0000 9.5186
0.012227 0 346 0.0000 9.5186
我想基于何时出现直方图(在这种情况下每8行)拆分它。我可以像这样分开它:
np.array_split(df,8)
但是如果有办法在关键字上执行此操作,我会更喜欢它。然后我想将每个拆分保存到自己的文本文件中。有没有办法做到这一点?
df.head().to_json()
返回:
{"Histogram ":{"0":"Band 1 ","1":"Bin=0.00233","2":" ","3":" ","4":" "}," DN":{"0":"-0.054741","1":"-0.052404","2":"-0.050067","3":"-0.047730","4":"-0.045393"}," Npts":{"0":" 1","1":" 0","2":" 0","3":" 0","4":" 0"}," Total":{"0":" 1","1":" 1","2":" 1","3":" 1","4":" 1"}," Percent":{"0":" 0.0250","1":" 0.0000","2":" 0.0000","3":" 0.0000","4":" 0.0000"}," Acc Pct":{"0":" 0.0250","1":" 0.0250","2":" 0.0250","3":" 0.0250","4":" 0.0250"}}
答案 0 :(得分:2)
首先,您应该规范化列名,但它们包含空格(这解释了您之前看到的KeyError):
In [11]: df1.columns
Out[11]:
Index([' DN', ' Npts', ' Total', ' Acc Pct', ' Percent', 'Histogram '], dtype='object')
In [12]: df1.columns.map(lambda x: x.strip())
Out[12]: array(['DN', 'Npts', 'Total', 'Acc Pct', 'Percent', 'Histogram'], dtype=object)
In [13]: df1.columns = df1.columns.map(lambda x: x.strip())
要按乐队分组,我会使用cumsum:
In [14]: df1 # similar to your example
Out[14]:
DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000
5 -0.054741 1 1 0.025 0.025 Band 2
6 -0.052404 0 1 0.025 0.000 Bin=0.00233
7 -0.050067 0 1 0.025 0.000
8 -0.047730 0 1 0.025 0.000
9 -0.045393 0 1 0.025 0.000
In [15]: df1["Histogram"].str.startswith("Band").cumsum()
Out[15]:
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 2
9 2
Name: Histogram, dtype: int64
你可以使用它来分组(你想要分割的方式):
In [16]: g = df1.groupby(df1["Histogram"].str.startswith("Band").cumsum())
现在,您可以随意提取/清洁:
In [21]: g.get_group(1)
Out[21]:
DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000
In [22]: [x for _, x in g]
Out[22]:
[ DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000 ,
DN Npts Total Acc Pct Percent Histogram
5 -0.054741 1 1 0.025 0.025 Band 2
6 -0.052404 0 1 0.025 0.000 Bin=0.00233
7 -0.050067 0 1 0.025 0.000
8 -0.047730 0 1 0.025 0.000
9 -0.045393 0 1 0.025 0.000 ]
答案 1 :(得分:0)
这将过滤数据帧txt并为历史记录创建新的txt文件:
count = 1
# used in the naming of the new txt files
txtFile = "his.txt"
# histogram text file
splitTxt = " Histogram DN Npts Total Percent Acc Pct"
# string used to split the lines of code into sections/blocks
with open(txtFile,"r") as myResults:
blocks = myResults.read()
for contents in blocks.split(splitTxt)[1:]:
lines = contents.split('\n')
with open('Results_{}.txt'.format(count), 'w') as op:
op.writelines('{}'.format(splitTxt))
for i in range(8):
op.writelines('{}\n'.format(lines[i]))
count = count + 1