Question

我需要从表格列中包含的数据字符串中提取的数字。

示例字符串：

newfile = open("test.csv", "w")
for row in readerObj:
  newrow = []
  for item in row:
    if " TB" in item:
      item = item.replace(" TB", "")
      item = re.sub('[^0-9]', '', item)
      item = float(item) * 1024
      item = round(item, 2)
    elif " MB" in item:
      item =  item.replace(" MB", "")
      item = re.sub('[^0-9]', '', item)
      item = float(item) / 1000
      item = round(item, 2)
    elif " GB" in item:
      item = item.replace(" GB", "")
      item = re.sub('[^0-9]', '', item)
      item = float(item)
      item = round(item, 2)
    newrow.append(str(item))
  newfile.write(','.join(newrow) + '\n')
newfile.close()
fileToClean.close()

在上面的示例字符串中，我需要提取的数字为Customer Name: Hit - julaifnaf afbafbaf Caraballo Pichardo vs PICHARDO ALBERTO Address: NA - abdcinfainaf 42982542542 vs xx Country of citizenship: NA Country of residency: NA Date of birth: NA - xx vs Nov-72 Place of birth: NA Identification Number: **1** emailDetails: Subject: abcdejnfanfa Sent To: abced@test.com 。蜇的长度和记录的位置各不相同，但是要提取的数字总是在1之后和Identification Number:之前。

我可以用什么功能来提取这些数据？

Answer 1

试试这个：

select 
    regexp_replace(column_name,'.*<strong>Identification Number</strong>:[^>\d]*(\d+)[^>\d]*<br\s*/>.*', '\1', 1, 0, 'inm') as id 
from html;

PS它不是一个非常可靠的解决方案，因为你无法使用RegExp解析任何 HTML。

输出：

         ID
-----------
          1

Answer 2

SELECT TO_NUMBER(
         REGEXP_SUBSTR(
           column_name,
           '<strong>Identification Number</strong>:.*?(\d+).*?<br />',
           1,
           1,
           NULL,
           1
         )
       ) AS id_number
FROM   table_name;

如何使用Oracle SQL从长数据字符串中提取特定数字？

2 个答案: