正则表达式以匹配给定的要求

时间:2018-01-23 11:14:11

标签: python regex

我有一个字符串:

s = '((FILTER( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER("SalesVelocity"."OrderHeader"."OpportunityRevenue" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN'')/
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER ( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON'') / 
 "SalesVelocity"."OrderHeader"."#Opportunities" ))/
 ((1.0 * "SalesVelocity"."OrderHeader"."TotalSalesCycleOppty")  / 
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON''))'

它将表格引用为"SchemaName"."TableName"."ColumnName" 我需要使用Schema提取所有表的信息 "SalesVelocity"."OrderHeader" "SalesVelocity"."Opportunity_1"

import re
pat = r'".*?\"\.".*?\"'             #See Note at the bottom of the answer
s = '((FILTER( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER("SalesVelocity"."OrderHeader"."OpportunityRevenue" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN'')/
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER ( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON'') / 
 "SalesVelocity"."OrderHeader"."#Opportunities" ))/
 ((1.0 * "SalesVelocity"."OrderHeader"."TotalSalesCycleOppty")  / 
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON''))'
match1 = re.findall(pat, s)
print(match1)

它输出为:

['"SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory"=OPEN))*(FILTER("SalesVelocity"."OrderHeader"', 
'"OpportunityRevenue" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory"=OPEN)/FILTER("SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory"=OPEN))*(FILTER ("SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory" = WON) / "SalesVelocity"."OrderHeader"', 
'"#Opportunities" ))/((1.0 * "SalesVelocity"."OrderHeader"', 
'"TotalSalesCycleOppty")  / FILTER("SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"']

哪个不正确,例如第二个值:

('"#Opportunities" USING "SalesVelocity"."Opportunity_1"')

我的表情检查是从“然后。?开始,所有字符到达\”然后再点“然后。?”所有字符,直到达到\“

我错过了什么?

3 个答案:

答案 0 :(得分:0)

以下正则表达式(根据Wiktor Stribizew的建议进行修改应该有效):"[^"]*?\"\."[^"]*\"

答案 1 :(得分:0)

这就是你需要的吗?

import re

s = '''((FILTER( "SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"='OPEN'))*(FILTER("SalesVelocity"."OrderHeader"."OpportunityRevenue" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"='OPEN')/FILTER("SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"='OPEN'))*(FILTER ( "SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = 'WON') / "SalesVelocity"."OrderHeader"."#Opportunities" ))/((1.0 * "SalesVelocity"."OrderHeader"."TotalSalesCycleOppty")  / FILTER("SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = 'WON'))'''

pat = r'"[^"]*?"\."[^"]*?"'             #See Note at the bottom of the answer

match1 = re.findall(pat, s)
print(match1)

输出:

['"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"']

答案 2 :(得分:0)

了解回溯

  • ".*?"匹配下一个""
  • 之后的最短子序列
  • ".*?"\.匹配下一个"".
  • 之后的最短子序列

因此".*?"\."#Opportunities" USING "SalesVelocity".匹配,因为在"\.匹配失败后,它会回溯到.*

否定前瞻更具表现力,因为它准确指定了不需要的令牌,

"(?:(?!").)*"\.".*?"

另一个修复方法是使用“。*?”

周围的原子组
"(?>.*?")\.".*?"

但在你的情况下,使用否定字符更有效:[^"]*,因为它避免了回溯。