从数字+字串的混合中提取数字

时间:2019-06-25 02:33:14

标签: python

我需要从具有数字和文本字符串组合的文本文件中提取特定数字。文本文件如下所示:

# General info User: b_stone, time: Sat Oct 21 16:10:03 2017 # Temperature C # Counters sec=60, Monitor=2.28666e+07, bstop=5.63852e+06, I0=7.33642e+06, I1=0, ch5=0, ch6=0, ch7=0, TEMP=-1, ICRxT=1, OCRxT=1, ROI1=1, ROI2=1, ROI3=1, ROI4=1, ROI5=1, ROI6=1, ccd1=0 # Motors goniy=40.7889, samplez=150, phi=-15.2418, th=0, detx=20.75, dety=4.7, detz=200, platz=0, m8=0, m9=0, m10=0, m11=0, dethorz=-5.10603, detvert=-31.6, detzold=879.936, m15=0, t1v=0, b1v=0, l1h=0, r1h=0, t2v=1.15812, b2v=-1.00813, l2h=1.24, r2h=-0.94, gshorz=4.50913, gsvert=0, stagey=-967.21, stagex=-102.567, bsvert=0, bshorz=0.24, samplex=-8, sampley=2.22, bpm=10090, monbend=330, monslit=9.29845, monang=-12, m0pitch=4.24658, m0vert=1.3622, m0bend=19.4294, m39=189.405, tablev1=39.7621, tablev2=59.9162, thor=1.09867, tableya=1.2001, bsv=24.3409, bsh=2.005, stagex1=1.95228, stagey1=3.07772, h1gap=0, h1tran=0, v1gap=0, v1tran=0, h2gap=0.3, h2tran=-1.09, v2gap=0.15, v2tran=1.0831, table=49.8391, tablep=0.01167, tempset=25

我想做的是查看“ Monitor = 2.28666e + 07”(请参阅​​大文本串的第二行),然后拉出“ 2.28666”。

我知道如何在matlab中执行此操作,但是我不知道如何将代码转换为python。

这是曾经使用过的matlab代码

%Extract the monitor value from .txt file generated
k=sprintf('txt_files/MnO2_20170515_2_05151339_%04d.txt',i);    
filetext=fileread(k);
numbers = str2double(regexp(filetext, '(?<=Monitor=[^0-9]*)[0-9]*\.?[0-9]+', 'match'));
MV = numbers; %Monitor value, I scale up the MV by 10^6
%save monitor values into vector m 
m(i)=MV;

我想在python中做同样的事情,并期望输出为2.28666。

2 个答案:

答案 0 :(得分:0)

re为您提供的Python作为标准正则表达式库,您可以尝试使用此代码解决问题。

>>>import re
>>>s = 'General info User: b_stone, time: Sat Oct 21 16:10:03 2017 # Temperature C # Counters sec=60, Monitor=2.28666e+07, bstop=5.63852e+06, I0=7.33642e+06, I1=0, ch5=0, ch6=0, ch7=0, TEMP=-1, ICRxT=1, OCRxT=1, ROI1=1, ROI2=1, ROI3=1, ROI4=1, ROI5=1, ROI6=1, ccd1=0 # Motors goniy=40.7889, samplez=150, phi=-15.2418, th=0, detx=20.75, dety=4.7, detz=200, platz=0, m8=0, m9=0, m10=0, m11=0, dethorz=-5.10603, detvert=-31.6, detzold=879.936, m15=0, t1v=0, b1v=0, l1h=0, r1h=0, t2v=1.15812, b2v=-1.00813, l2h=1.24, r2h=-0.94, gshorz=4.50913, gsvert=0, stagey=-967.21, stagex=-102.567, bsvert=0, bshorz=0.24, samplex=-8, sampley=2.22, bpm=10090, monbend=330, monslit=9.29845, monang=-12, m0pitch=4.24658, m0vert=1.3622, m0bend=19.4294, m39=189.405, tablev1=39.7621, tablev2=59.9162, thor=1.09867, tableya=1.2001, bsv=24.3409, bsh=2.005, stagex1=1.95228, stagey1=3.07772, h1gap=0, h1tran=0, v1gap=0, v1tran=0, h2gap=0.3, h2tran=-1.09, v2gap=0.15, v2tran=1.0831, table=49.8391, tablep=0.01167, tempset=25'
>>>a = re.findall(r'Monitor=(\d.\d+)', s)
>>>a
['2.28666']
>>>a[0]
'2.28666'

答案 1 :(得分:0)

import re

with open('KMnOx_1Macid_1020_0630pm_10201916_0001.txt', "r") as F:
    readDat =F.read()

    parsedData =re.findall(r"Monitor=([+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+))", readDat) %read the whole number including e+07
    parsedData =re.findall(r"Monitor=(\d.\d+)", readDat) %read the number except e+07

    monitorCount = float(parsedData[0])


print(monitorCount)

基于LêTưThành的建议。 我尝试了这段代码,结果就很成功了:)