Python RegEx在特定字符串后获取单词

时间:2017-12-29 20:46:31

标签: python regex

有一个字符串

  

string= """"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,3)","name":"Finance","$type":"voyager.identity.profile.Skill"},{"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,22)","name":"Financial ["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,34)","name":"Due Diligence","name":"Strategy""""

我可以使用哪种reguar表达式来检索“名称”之后的值:获得尽职调查,财务和财务

我试过了

match = re.compile(r'"name"\:(.\w+)') match.findall(string)

但它返回

['"Finance', '"Financial', '"Due', '"Financial', '"Strategy'] Due Diligence被拆分,我希望两个单词合为一体。

2 个答案:

答案 0 :(得分:1)

正则表达式无法检测到您的空格,因为/w仅搜索非特殊字符。

"name"\:(.\w+\s*\w*)说明任何可能的空格,并附加一个单词(对于三个单词不起作用,但会在你的情况下起作用)

"name"\:(.\w+\s*\w*"?)在每个结尾处都会引用"但不会获得财务报价。 Example

修改:修复了"财务

的第二个正则表达式

答案 1 :(得分:0)

我会使用带有尾随引号的非饥饿.*?表达式:

import re

string = """$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,3)","name":"Finance","$type":"voyager.identity.profile.Skill"},{"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,22)","name":"Financial ["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,34)","name":"Due Diligence","name":"Strategy"""

# With the leading double quote
match = re.compile(r'"name"\:(".*?)["\[]')
a = match.findall(string)
print a

# Stripping out the leading double quote
match = re.compile(r'"name"\:"(.*?)["\[]')
b = match.findall(string)
print b

最终输出是:

['"Finance', '"Financial ', '"Due Diligence']
['Finance', 'Financial ', 'Due Diligence']