跳过包含字符串的所有行,并仅保留带浮点数的行

时间:2018-05-16 15:05:30

标签: python csv parsing genfromtxt

我有一个来自数学模拟的日志文件。我试图用Python解析它,但我对结果不太满意。有没有"优雅"循环每一行并对其进行排序的方法是为了只保留具有物理值的行并抛弃其余部分?

目标是使用numpy执行各种分析。知道我需要的线只包含数值,有没有办法告诉" python只保留带有数值的行/行,并抛弃包含字符串的所有行?谢谢你的帮助。附加了日志文件的示例。

 5 Host 1 -- hnode146 -- Ranks 20-39
 6 Host 2 -- hnode147 -- Ranks 40-59
 7 Host 3 -- hnode148 -- Ranks 60-79
 8 Process rank 0 hnode145 36210
 9 Total number of processes : 80
10
11 STAR-CCM+ 12.02.011 (linux-x86_64-2.5/gnu4.8-r8)
12 License build date: 10 February 2015
13 This version of the code requires license version 2017.02 or greater.
14 Checking license file:
15 Checking license file:
16 Unable to list features for license file
17 1 copy of ccmppower checked out from
18 Feature ccmppower expires in
19 Thu Apr 19 17:22:54 2018
20
21 Server::start -host h
22 Loading object database:
23 Loading module: StarMeshing
24 Loading module: MeshingSurfaceRepair
25 Loading module: CadModeler
26 Started Parasolid modeler version 29.01.131
27 Loading module: StarResurfacer
28 Loading module: StarTrimmer
29 Loading module: SegregatedFlowModel
30 Loading module: KwTurbModel
31 Loading module: StarDualMesher
32 Loading module: StarBodyFittedMesher
33 Simulation database saved by:
34   STAR-CCM+ 12.02.011 (linux-x86_64-2.5/gnu4.8-r8) Fri Mar 10 20:03:37 UTC 2017 Serial
35 Loading into:
36   STAR-CCM+ 12.02.011 (linux-x86_64-2.5/gnu4.8-r8) Fri Mar 10 20:03:37 UTC 2017 Np=80
37 Object database load completed.

39 A Zeit und Datum : 2018.04.19 at 17:23:11
40
41 Startzeit: 1524151391534
42
43 Loading/configuring connectivity (old|new partitions: 1|80)
44   Domain (index 1): 1889922 cells, 5614862 faces, 1990686 verts.
45 Configuring finished
46 Reading material property database "/sw/apps/cd-adapco/12.02.011-R8/STAR-CCM+12.02.011-R8/star/props.mdb"...
47 Re-partitioning
48      Iteration     Continuity     X-momentum     Y-momentum     Z-momentum            Tke            Sdr Shear+Pressure (N)   Pressure (N)      Shear (N)
49           2001   1.076589e-01   9.570364e-01   2.588931e-01   1.984590e-01   4.028215e-03   3.964344e+01      -6.468809e+00  -1.253867e+00  -5.214942e+00
50           2002   5.987195e-02   4.004615e-01   2.597862e-01   1.808196e-01   2.819456e-03   2.537490e+01      -5.154729e+00  -1.228644e+00  -3.926085e+00
51           2003   4.824863e-02   2.048600e-01   1.359121e-01   1.103614e-01   1.384044e-03   1.623916e+01      -4.277053e+00  -1.216038e+00  -3.061015e+00
52           2004   3.684017e-02   1.322581e-01   1.350187e-01   8.827220e-02   9.023783e-04   1.039251e+01      -3.914011e+00  -1.213340e+00  -2.700671e+00
53           2005   3.224797e-02   1.093365e-01   1.059148e-01   7.461911e-02   6.307195e-04   6.650742e+00      -3.745949e+00  -1.217353e+00  -2.528596e+00
54           2006   2.788050e-02   9.180507e-02   8.311817e-02   6.417279e-02   4.603072e-04   4.256107e+00      -3.658613e+00  -1.224046e+00  -2.434567e+00
55           2007   2.332397e-02   7.688239e-02   6.222694e-02   4.860232e-02   3.534658e-04   2.723686e+00      -3.608431e+00  -1.231574e+00  -2.376857e+00
56           2008   1.916130e-02   6.201947e-02   4.645780e-02   3.654489e-02   2.833177e-04   1.743055e+00      -3.575486e+00  -1.237352e+00  -2.338134e+00
57           2009   1.600865e-02   4.780234e-02   3.909247e-02   2.959689e-02   2.370245e-04   1.115506e+00      -3.548365e+00  -1.240938e+00  -2.307427e+00
58           2010   1.389765e-02   3.570659e-02   3.492423e-02   2.537285e-02   2.055279e-04   7.138997e-01      -3.527530e+00  -1.242749e+00  -2.284781e+00
59      Iteration     Continuity     X-momentum     Y-momentum     Z-momentum            Tke            Sdr Shear+Pressure (N)   Pressure (N)      Shear (N)
60           2011   1.253570e-02   2.591702e-02   3.089287e-02   2.209728e-02   1.814997e-04   4.568718e-01      -3.511034e+00  -1.242906e+00  -2.268128e+00
61           2012   1.141436e-02   1.992464e-02   2.745902e-02   1.922942e-02   1.636478e-04   2.923702e-01      -3.498876e+00  -1.243006e+00  -2.255870e+00
62           2013   1.024511e-02   1.621655e-02   2.544053e-02   1.687660e-02   1.492828e-04   1.870937e-01      -3.489288e+00  -1.242425e+00  -2.246863e+00
63           2014   9.067693e-03   1.359007e-02   2.320886e-02   1.481687e-02   1.371763e-04   1.197299e-01      -3.482323e+00  -1.242027e+00  -2.240295e+00
64           2015   7.906450e-03   1.159567e-02   2.073906e-02   1.306014e-02   1.265825e-04   7.662597e-02      -3.479134e+00  -1.243537e+00  -2.235597e+00
65           2016   6.889290e-03   1.010569e-02   1.787383e-02   1.258395e-02   1.171344e-04   4.903984e-02      -3.479042e+00  -1.246677e+00  -2.232364e+00
66           2017   5.982303e-03   8.872579e-03   1.576665e-02   1.141871e-02   1.086443e-04   3.138620e-02      -3.480301e+00  -1.249988e+00  -2.230313e+00
67           2018   5.191895e-03   7.958489e-03   1.446382e-02   9.796685e-03   1.009937e-04   2.009149e-02      -3.482459e+00  -1.253255e+00  -2.229204e+00
68           2019   4.614927e-03   7.193031e-03   1.279295e-02   8.818100e-03   9.411761e-05   1.286594e-02      -3.484886e+00  -1.256002e+00  -2.228885e+00
69           2020   4.159939e-03   6.571088e-03   1.146195e-02   7.756150e-03   8.794392e-05   8.241197e-03      -3.487597e+00  -1.258382e+00  -2.229214e+00
70      Iteration     Continuity     X-momentum     Y-momentum     Z-momentum            Tke            Sdr Shear+Pressure (N)   Pressure (N)      Shear (N)
71           2021   3.779168e-03   5.961164e-03   1.034847e-02   6.969454e-03   8.240903e-05   5.278791e-03      -3.490138e+00  -1.260061e+00  -2.230078e+00
72           2022   3.414811e-03   5.350398e-03   9.329119e-03   6.398522e-03   7.743586e-05   3.381806e-03      -3.491624e+00  -1.260241e+00  -2.231384e+00

3 个答案:

答案 0 :(得分:1)

阅读每一行。在空格上拆分,尝试将每个实体转换为浮点数。如果转换失败,则不保留该行。肯定有一种方法可以用正则表达式来做到这一点,但这应该是我的头脑。

lines_to_keep = []
for line in f.readlines():
    try:
        # Throws ValueError if `x` can't be converted to float
        [float(x) for x in line.split()] 
        # If the above line didn't throw a ValueError, keep it 
        lines_to_keep.append(line)
    except ValueError:
        continue

答案 1 :(得分:0)

如果你喜欢正则表达式。这匹配由数字符号分隔的连续数字,例如' + - 。e'。

import re

r = re.compile(r'([0-9 ]+[e.\-+]*)+\n')
lines = [line for line in open('a.log') if r.fullmatch(line)]

# all the useful lines are ...
# 49           2001   1.076589e-01   9.570364e-01   2.588931e-01   1.984590e-01   4.028215e-03   3.964344e+01      -6.468809e+00  -1.253867e+00  -5.214942e+00
# 50           2002   5.987195e-02   4.004615e-01   2.597862e-01   1.808196e-01   2.819456e-03   2.537490e+01      -5.154729e+00  -1.228644e+00  -3.926085e+00
# 51           2003   4.824863e-02   2.048600e-01   1.359121e-01   1.103614e-01   1.384044e-03   1.623916e+01      -4.277053e+00  -1.216038e+00  -3.061015e+00

答案 2 :(得分:0)

import re
list_to_keep=[]
pattern= re.compile(r'[0-9 ]+[e.\-+][0-9]*',re.IGNORECASE)                
with open(f, 'rb') as csvfile:
   reader = csv.reader(csvfile, delimiter='\n')
   for row in reader:
       if(pattern.match(str(row))):
           list_to_keep.append(row)

可以使用正则表达式查找行并将其保留在列表中。