使用python从纯文本中提取包括空值的字符串

时间:2018-12-18 11:15:35

标签: python python-3.x

我有一个字符串

a='S
LINC             SHORT LEGAL                                   TITLE NUMBER
0037 471 661     1720278;16;21                                 172 211 342

LEGAL DESCRIPTION
PLAN 1720278  
BLOCK 16  
LOT 21  
EXCEPTING THEREOUT ALL MINES AND MINERALS  

ESTATE: FEE SIMPLE  
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

---------------------------------------------------------------------------- 
----
                     REGISTERED OWNER(S)
REGISTRATION    DATE(DMY)  DOCUMENT TYPE      VALUE           CONSIDERATION
----------------------------------------------------------------------------- 
---

172 211 342    15/08/2017                      $610,000        CASH & MTGE'

需要提取低于文档类型,值和注意事项的值,并以['','$610,000','CASH & MTGE']之类的数组输出 我尝试使用findall(r'(?<!\S)(?:[$]\S+|[^$\d]+)\b', a)。但是我只能得到['$610,000','CASH & MTGE'],而文档类型却没有值,因为它为空。

1 个答案:

答案 0 :(得分:0)

据我了解,您想从字符串右边返回一个值为$610,000 CASH & MTGE'的数组吗?

假设所需的字符串值将保留在最后,我们可以利用splitlines函数。然后,使用len(a)-1来获取所需的字符串,如下所示:

>>> a='''S
LINC             SHORT LEGAL                                   TITLE NUMBER
0037 471 661     1720278;16;21                                 172 211 342

LEGAL DESCRIPTION
PLAN 1720278  
BLOCK 16  
LOT 21  
EXCEPTING THEREOUT ALL MINES AND MINERALS  

ESTATE: FEE SIMPLE  
ATS REFERENCE: 4;24;54;2;SW

MUNICIPALITY: CITY OF EDMONTON

REFERENCE NUMBER: 172 023 641 +71

---------------------------------------------------------------------------- 
----
                     REGISTERED OWNER(S)
REGISTRATION    DATE(DMY)  DOCUMENT TYPE      VALUE           CONSIDERATION
----------------------------------------------------------------------------- 
---

172 211 342    15/08/2017                      $610,000        CASH & MTGE'''

>>> b=a.splitlines()
>>> req_line = b[len(b)-1]
>>> print(req_line)