Python 3中的re.sub

时间:2019-04-16 02:05:36

标签: python

我有以下几种文字

1. DIMENSIONS:  | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe: 
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
4. DIMENSIONS:  | ORIGIN:
5. Review attribution | DIMENSIONS:  | ORIGIN:
6. Pipe: | DIMENSIONS:  | ORIGIN: 2010 PureData Survey

所需的输出

1. Position corrected and IL (0) was changed based on RPS: 3482 -230
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm
4. 
5. Review attribution
6. ORIGIN: 2010 PureData Survey

基本上我想摆脱任何空白键,例如“尺寸”,“原点”,“管道”等。

我认为我们必须对每个键分别进行操作...我希望这样做,因为我还需要使用更多的键。

根据https://regex101.com/r/OX1W3b/6

(.*)DIMENSIONS:  \|(.*)

可以工作,但是我不确定如何在python中使用它

import re
str='DIMENSIONS:  | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230'
x=re.sub(".*DIMENSIONS.*","(.*)DIMENSIONS:  \|(.*)",str)
print(x)

由于re.sub中的第二个值只是一个字符串,而不是正则表达式函数,因此只重复了第二个值。

在Google表格中,我将使用=REGEXEXTRACT(A1,"(.*)DIMENSIONS: \|(.*)")

python中是否有类似的东西? Re.sub需要替换为该值,但我是从正则表达式捕获组中获取的。

请注意,这与我在gis se中的问题类似-因为它更像是python问题,而不是gis问题。

1 个答案:

答案 0 :(得分:2)

我想说的是将|上的每一行拆分为单独的字段,检查是否没有值,然后在s = '''DIMENSIONS: | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe: DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN: DIMENSIONS: | ORIGIN: Review attribution | DIMENSIONS: | ORIGIN: Pipe: | DIMENSIONS: | ORIGIN: 2010 PureData Survey'''.splitlines() result = [] for line in s: line = line.split('|') lst = [] for field in line: if not field.strip().endswith(':'): lst.append(field) result.append('|'.join(lst).strip()) 上重新加入:

result = ['|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip() for line in s]

或者,一行:

'\n'.join(result)

请注意,这为您提供了行列表。您可以根据需要使用'|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip() 重新加入他们。

这是解析每一行的部分:

line

例如,如果DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:DIMENSIONS: 3 x 375 RCP | Pipe: 35mm ,那么我们可以这样做:

isolatedModules