正则表达式拉出变量字符串

时间:2015-03-14 01:26:11

标签: regex string python-2.7 substring

我在PYthon 2.7中有这个字符串列表:

list_a = ['temp_52_head sensor,
uploaded by TS','crack in the left quadrant, uploaded by AB, Left in 2hr
sunlight','FSL_pressure, uploaded by RS, no reported vacuum','art
9943_mercury, Uploaded by DY, accelerated, hurst potential too
low','uploaded by KKP, Space 55','avogadro reading level,
uploaded by HB, started mini counter, pulled lever','no comment
yesterday, Uploaded to TFG, level 1 escape but temperature stable,
pressure lever north']

在每个列表项中,都有一个字符串

uploaded by SOMEONE

我需要提取SOMEONE

但是,正如您所见,SOMEONE

  1. 从列表中的一个项目更改为下一个项目。
  2. 的长度可以是2或3个字符(仅限文本,无数字)。
  3. 出现在字符串中的不同位置。
  4. 上传也发生在上载
  5. 上传有时会发生在任何逗号之前
  6. 以下是我需要提取的内容:

    someone_names = ['TS','AB','RS','DY','KKP','HB','TFG']
    

    我正在考虑使用正则表达式,但我遇到的问题来自上面的第2点和第3点。

    有没有办法从列表中提取这些字符?

4 个答案:

答案 0 :(得分:4)

您可以使用列表推导来实现正则表达式。

>>> import re
>>> list_a = [
      'temp_52_head sensor, uploaded by TS',
      'crack in the left quadrant, uploaded by AB, Left in 2hr sunlight',
      'FSL_pressure, uploaded by RS, no reported vacuum',
      'art9943_mercury, Uploaded by DY, accelerated, hurst potential too low',
      'uploaded by KKP, Space 55',
      'avogadro reading level, uploaded by HB, started mini counter, pulled lever',
      'no comment yesterday, Uploaded to TFG, level 1 escape but temperature stable,pressure lever north'
]
>>> regex = re.compile(r'(?i)\buploaded\s*(?:by|to)\s*([a-z]{2,3})')
>>> names = [m.group(1) for x in list_a for m in [regex.search(x)] if m]
['TS', 'AB', 'RS', 'DY', 'KKP', 'HB', 'TFG']

答案 1 :(得分:1)

不是正则表达式,但更详细的方法可能就是这样:

import re
name = re.search(re.escape("uploaded by ")+"(.*?)"+re.escape(","),list_a[x]).group(1)

答案 2 :(得分:0)

看起来像这样的正则表达式符合您的要求,除非我遗漏了一些东西:

/[U|u]ploaded by ([A-Z]{2}|[A-Z]{3}),/

或者,它(从你的样本中)出现你也可以用逗号分割字符串,并从具有字符串"由#34; (避免上/下" u")的可能性,将其拆分为空格,然后取得结果数组中的最后一个元素。

答案 3 :(得分:0)

这个正则表达式会击中所有这些,如果你改变了上传器首字母中有多少个字母,它仍然可以工作。无论是否有逗号或两三个字母后的单引号,这都会匹配。它还将捕获您正在寻找的所有数据:

import re

m = re.compile('uploaded ((by)|(to)) ([a-z]+)', flags=re.IGNORCASE)

然后,您可以将搜索模式对象msearch()函数一起使用,它将拉出所有匹配项。每次迭代中的第4个匹配项是您要查找的数据。