Question

我有一个数据框：

  Histogram           DN     Npts    Total   Percent   Acc Pct
  Band 1       -0.054741        1        1    0.0250    0.0250
  Bin=0.00233  -0.052404        0        1    0.0000    0.0250
               -0.050067        0        1    0.0000    0.0250
               -0.047730        0        1    0.0000    0.0250
               -0.045393        0        1    0.0000    0.0250
               -0.043056        0        1    0.0000    0.0250
               -0.040719        0        1    0.0000    0.0250
  Histogram           DN     Npts    Total   Percent   Acc Pct
  Band 2        0.000000      346      346    9.5186    9.5186
  Bin=0.00203   0.002038        0      346    0.0000    9.5186
                0.004076        0      346    0.0000    9.5186
                0.006114        0      346    0.0000    9.5186
                0.008152        0      346    0.0000    9.5186
                0.010189        0      346    0.0000    9.5186
                0.012227        0      346    0.0000    9.5186

我想基于何时出现直方图（在这种情况下每8行）拆分它。我可以像这样分开它：

np.array_split(df,8)

但是如果有办法在关键字上执行此操作，我会更喜欢它。然后我想将每个拆分保存到自己的文本文件中。有没有办法做到这一点？

df.head().to_json()返回：

{"Histogram  ":{"0":"Band 1     ","1":"Bin=0.00233","2":"           ","3":"           ","4":"           "},"       DN":{"0":"-0.054741","1":"-0.052404","2":"-0.050067","3":"-0.047730","4":"-0.045393"},"   Npts":{"0":"      1","1":"      0","2":"      0","3":"      0","4":"      0"},"  Total":{"0":"      1","1":"      1","2":"      1","3":"      1","4":"      1"}," Percent":{"0":"  0.0250","1":"  0.0000","2":"  0.0000","3":"  0.0000","4":"  0.0000"}," Acc Pct":{"0":"  0.0250","1":"  0.0250","2":"  0.0250","3":"  0.0250","4":"  0.0250"}}

Answer 1

首先，您应该规范化列名，但它们包含空格（这解释了您之前看到的KeyError）：

In [11]: df1.columns
Out[11]:
Index(['       DN', '   Npts', '  Total', ' Acc Pct', ' Percent', 'Histogram  '], dtype='object')

In [12]: df1.columns.map(lambda x: x.strip())
Out[12]: array(['DN', 'Npts', 'Total', 'Acc Pct', 'Percent', 'Histogram'], dtype=object)

In [13]: df1.columns = df1.columns.map(lambda x: x.strip())

要按乐队分组，我会使用cumsum：

In [14]: df1  # similar to your example
Out[14]:
         DN  Npts  Total  Acc Pct  Percent    Histogram
0 -0.054741     1      1    0.025    0.025  Band 1
1 -0.052404     0      1    0.025    0.000  Bin=0.00233
2 -0.050067     0      1    0.025    0.000
3 -0.047730     0      1    0.025    0.000
4 -0.045393     0      1    0.025    0.000
5 -0.054741     1      1    0.025    0.025  Band 2
6 -0.052404     0      1    0.025    0.000  Bin=0.00233
7 -0.050067     0      1    0.025    0.000
8 -0.047730     0      1    0.025    0.000
9 -0.045393     0      1    0.025    0.000

In [15]: df1["Histogram"].str.startswith("Band").cumsum()
Out[15]:
0    1
1    1
2    1
3    1
4    1
5    2
6    2
7    2
8    2
9    2
Name: Histogram, dtype: int64

你可以使用它来分组（你想要分割的方式）：

In [16]: g = df1.groupby(df1["Histogram"].str.startswith("Band").cumsum())

现在，您可以随意提取/清洁：

In [21]: g.get_group(1)
Out[21]:
         DN  Npts  Total  Acc Pct  Percent    Histogram
0 -0.054741     1      1    0.025    0.025  Band 1
1 -0.052404     0      1    0.025    0.000  Bin=0.00233
2 -0.050067     0      1    0.025    0.000
3 -0.047730     0      1    0.025    0.000
4 -0.045393     0      1    0.025    0.000

In [22]: [x for _, x in g]
Out[22]:
[         DN  Npts  Total  Acc Pct  Percent    Histogram
 0 -0.054741     1      1    0.025    0.025  Band 1
 1 -0.052404     0      1    0.025    0.000  Bin=0.00233
 2 -0.050067     0      1    0.025    0.000
 3 -0.047730     0      1    0.025    0.000
 4 -0.045393     0      1    0.025    0.000             ,
          DN  Npts  Total  Acc Pct  Percent    Histogram
 5 -0.054741     1      1    0.025    0.025  Band 2
 6 -0.052404     0      1    0.025    0.000  Bin=0.00233
 7 -0.050067     0      1    0.025    0.000
 8 -0.047730     0      1    0.025    0.000
 9 -0.045393     0      1    0.025    0.000             ]

Answer 2

这将过滤数据帧txt并为历史记录创建新的txt文件：

count = 1
# used in the naming of the new txt files

txtFile = "his.txt"
# histogram text file

splitTxt = " Histogram           DN     Npts    Total   Percent   Acc Pct"
# string used to split the lines of code into sections/blocks

with open(txtFile,"r") as myResults:

   blocks = myResults.read()

for contents in blocks.split(splitTxt)[1:]:

    lines = contents.split('\n')

    with open('Results_{}.txt'.format(count), 'w') as op:

        op.writelines('{}'.format(splitTxt))

        for i in range(8):

            op.writelines('{}\n'.format(lines[i]))

    count = count + 1

拆分数据帧并保存到txt文件

2 个答案: