我需要从表格列中包含的数据字符串中提取的数字。
示例字符串:
newfile = open("test.csv", "w")
for row in readerObj:
newrow = []
for item in row:
if " TB" in item:
item = item.replace(" TB", "")
item = re.sub('[^0-9]', '', item)
item = float(item) * 1024
item = round(item, 2)
elif " MB" in item:
item = item.replace(" MB", "")
item = re.sub('[^0-9]', '', item)
item = float(item) / 1000
item = round(item, 2)
elif " GB" in item:
item = item.replace(" GB", "")
item = re.sub('[^0-9]', '', item)
item = float(item)
item = round(item, 2)
newrow.append(str(item))
newfile.write(','.join(newrow) + '\n')
newfile.close()
fileToClean.close()
在上面的示例字符串中,我需要提取的数字为<strong>Customer Name</strong>: Hit - julaifnaf afbafbaf Caraballo Pichardo vs PICHARDO ALBERTO<br />
<strong>Address</strong>: NA - abdcinfainaf 42982542542 vs xx<br />
<strong>Country of citizenship</strong>: NA<br />
<strong>Country of residency</strong>: NA<br />
<strong>Date of birth</strong>: NA - xx vs Nov-72<br />
<strong>Place of birth</strong>: NA<br />
<strong>Identification Number</strong>: **1**<br />
<strong>emailDetails</strong>: <br/>
<b>Subject: </b>abcdejnfanfa <br/>
<b>Sent To: </b>abced@test.com<br/>
。
蜇的长度和记录的位置各不相同,
但是要提取的数字总是在1
之后和Identification Number</strong>:
之前。
我可以用什么功能来提取这些数据?
答案 0 :(得分:0)
试试这个:
select
regexp_replace(column_name,'.*<strong>Identification Number</strong>:[^>\d]*(\d+)[^>\d]*<br\s*/>.*', '\1', 1, 0, 'inm') as id
from html;
PS它不是一个非常可靠的解决方案,因为你无法使用RegExp解析任何 HTML。
输出:
ID
-----------
1
答案 1 :(得分:0)
SELECT TO_NUMBER(
REGEXP_SUBSTR(
column_name,
'<strong>Identification Number</strong>:.*?(\d+).*?<br />',
1,
1,
NULL,
1
)
) AS id_number
FROM table_name;