Question

我正在尝试将以下所示形式的数据加载到数据框中。

popSize: 1000
numSurvivors: 0
tournamentSize: 10
probMutation: 0.1
probCrossover: 0.9
numIters: 100
Accuracy: 96.84 
Error Rate: 3.16 
Not Classified: 0.00
Total time: 5.367

popSize: 1000
numSurvivors: 0
tournamentSize: 10
probMutation: 0.1
probCrossover: 0.9
numIters: 100
Accuracy: 96.84 
Error Rate: 3.16 
Not Classified: 0.00
Total time: 4.472

popSize: 1000
numSurvivors: 0
tournamentSize: 10
probMutation: 0.1
probCrossover: 0.9
numIters: 100
Accuracy: 92.11 
Error Rate: 7.89 
Not Classified: 0.00
Total time: 4.46

数据代表算法的多次执行。是否有一种方法可以使用最后4个值的平均结果将数据加载为单行？

Answer 1

这是一种使用itertools.groupby()和pandas将数据整理到数据帧中的方法：

from itertools import groupby
import pandas as pd

with open('test.txt', 'r') as f:

    chunks = [list(group) for k, group in groupby(f.readlines(), lambda x: x=='\n') if not k]

chunks = [dict([tuple(i.strip().split(': ')) for i in chunk]) for chunk in chunks]

df = pd.DataFrame(chunks).astype(float)

返回：

  Accuracy Error Rate Not Classified Total time numIters numSurvivors popSize  \
0    96.84       3.16           0.00      5.367      100            0    1000   
1    96.84       3.16           0.00      4.472      100            0    1000   
2    92.11       7.89           0.00       4.46      100            0    1000   

  probCrossover probMutation tournamentSize  
0           0.9          0.1             10  
1           0.9          0.1             10  
2           0.9          0.1             10

您可以轻松地计算出平均值，如下所示：

df[['Accuracy','Error Rate','Not Classified','Total time']].mean()

返回：

Accuracy          95.263333
Error Rate         4.736667
Not Classified     0.000000
Total time         4.766333
dtype: float64

Answer 2

 (Round(case when ret2 <> 0 or originalretail <> 0 
  then case when ret2 > 0 then (ret2- retone)/ret2 
  when originalretail > 0 then (originalretail-retone)/originalretail 
  else null end end,2))*100 as [Savings %]

一次加载数据并取平均值

2 个答案: