使用Python 3.8在文本中查找正则表达式

时间:2019-12-10 15:53:09

标签: python regex findall

我正在尝试使用正则表达式从某些文本(现在只是一点点)中提取钻孔编号的列表但不能完全使 regex 正常工作。

格式通常为两个大写字母,后跟多个不同的长度,例如BH1, WS01, BH001

,有时以大写字母结尾,以表示它已被重新钻孔,例如BH2301A是BH2301的重播。

我能得到的最接近的返回一个元组,但是然后我只需要返回第一个值。

BH = re.compile(r'\w\w\d+')
BH.findall('4.9.26.9.2  A summary of rotary coring data from White Chalk (South) is provided in Table 37. Downhole geophysics (optical and acoustic televiewers) was undertaken on five boreholes in the LTC Pumping Tests investigation BH2313A, OH03001, OH03002, OH03003, OH04007 and OH04008. Televiewer data provides a more accurate assessment of fracture location and orientation than the associated core logging. This is because the borehole wall is less susceptible to drilling induced disturbance than the recovered core. An automated discontinuity picking algorithm within WellCAD© software was used by the contractor to identify features in the televiewer datasets. ')
['BH2313', 'OH03001', 'OH03002', 'OH03003', 'OH04007', 'OH04008']

BH = re.compile(r'(\w\w\d+(\w)?)')
BH.findall('4.9.26.9.2  A summary of rotary coring data from White Chalk (South) is provided in Table 37. Downhole geophysics (optical and acoustic televiewers) was undertaken on five boreholes in the LTC Pumping Tests investigation BH2313A, OH03001, OH03002, OH03003, OH04007 and OH04008. Televiewer data provides a more accurate assessment of fracture location and orientation than the associated core logging. This is because the borehole wall is less susceptible to drilling induced disturbance than the recovered core. An automated discontinuity picking algorithm within WellCAD© software was used by the contractor to identify features in the televiewer datasets. ')
[('BH2313A', 'A'), ('OH03001', ''), ('OH03002', ''), ('OH03003', ''), ('OH04007', ''), ('OH04008', '')]

使用regex,元组输出是最好的吗?我是否需要循环将它们切成薄片以返回第一组?

1 个答案:

答案 0 :(得分:0)

只需从得到的元组中提取第一个值:

iter(dict)