我正在研究正则表达式查询,以将特定模式的文本返回到组中。这是我使用的正则表达式:r"([\w+ \-? \w+]* [\w+ ]+ [\(?\w+ \)?]*) (\(?[\d,-]+\)?) (\(?[\d,-]+\)?)"
。这是我正在解析的示例行以及我希望输出的内容:
1) String: LOSS BEFORE INCOME TAXES (900,000) (900,000)
Desired output: [('LOSS BEFORE INCOME TAXES', '(900,000)', '(900,000)')]
Final result: correct
2) String: INCOME TAXES (RECOVERED) (90,000) (90,000)
Desired output: [('INCOME TAXES (RECOVERED)', '(90,000)', '(90,000)')]
Final result: correct
3) String: RETAINED EARNINGS - BEGINNING OF YEAR 9,999,999 9,999,999
Desired output: [('RETAINED EARNINGS - BEGINNING OF YEAR', '9,999,999', '9,999,999')]
Final result: correct
4) String: EXPENSES
Desired output: ['EXPENSES']
Final result: correct
5) String: Subcontracts 8,058 2,655
Desired output: [('Subcontracts', '8,000,000')]
Final result: ['Subcontracts 8', '', '058 2', '', '655', '']
6) String: Business taxes 116 -
Desired output: [('Business taxes', '116', '-')]
Final result: ['Business taxes 116 ', '', '']
7) String: 600,000 600,000
Desired output: [(600,000), (600,000)]
Final result: ['642', '', '437 629', '', '070', '']
8) String: Salaries, wages and benefits 400,000 400,000
Desired output: [('Salaries, wages and benefits', '400,000', '400,000')]
Final result: [(' wages and benefits', '463,437', '466,742')]
我不确定自己做错了什么或缺少什么,但是5、6、7和8遇到了问题。如何调整上述查询,使其能够解决所有上述情况?预先感谢!
答案 0 :(得分:1)
您可以尝试与此伴侣
^([a-z, \(\)-]*?)?\(?([\d,]+)?\)?\s*?\(?([\d,-]+)?\)?$
说明
^
-字符串开头的锚点。([a-z, \(\)-]+?)?
-将任何字符a匹配到z,或匹配,
或(
或')`或'-'零个或多个时间(惰性模式)。\(?
-匹配(
(?
使其为可选)。,
。(?
使其为可选)。\)
-匹配)
。\s*?
-匹配空间零个或多个时间。(?([\d,-]+)?\)?
-匹配任意数字或-
。$
-字符串结尾。答案 1 :(得分:1)
我认为此正则表达式将满足您的要求:
^([A-Z][A-Za-z0-9 (),%;-]+?[^(\d\s])? ?(?:(\(?[\d,]+\)?|-)\s+(\(?[\d,]+\)?|-))?$
它将查找一组字母字符,以字母开头,可能包括[(),%;-]
,但不以(
,数字或空格结尾,然后是两组可能的{ {1}}包含数字和()
或,
。所有组都是可选的,以允许没有描述或没有编号的匹配行。
在Python中:
-
输出:
import re
data = """LOSS BEFORE INCOME TAXES (900,000) (900,000)
INCOME TAXES (RECOVERED) (90,000) (90,000)
RETAINED EARNINGS - BEGINNING OF YEAR 9,999,999 9,999,999
EXPENSES
Subcontracts 8,058 2,655
Business taxes 116 -
600,000 600,000
GROSS PROFIT (50%; 2016 - 50%) 500,000 500,000
Bad debts - 50
Salaries, wages and benefits 400,000 400,000"""
regex = re.compile('^([A-Z][A-Za-z0-9 (),%;-]+?[^(\d\s])? ?(?:(\(?[\d,]+\)?|-)\s+(\(?[\d,]+\)?|-))?$', re.MULTILINE)
print regex.findall(data)
答案 2 :(得分:-1)
尝试使用正则表达式
r"([\w ,()-]*)[\(?[\d, -]*\)?]*[\(?[\d, -]*\)?]*"