我有一个函数,根据维基页面(https://en.wikipedia.org/wiki/Check_digit)检查输入的ISBN代码是否有效:
create or replace FUNCTION fn_isbn_valid (in_isbn IN VARCHAR2) RETURN VARCHAR2
IS
l_isbn VARCHAR2(14);
l_last_digit NUMBER(1);
l_checksum NUMBER;
BEGIN
l_isbn := LPAD(in_isbn, 14, 0);
l_last_digit := TO_NUMBER(SUBSTR(l_isbn, -1, 1));
l_checksum :=
((10 -
((
3 * (
TO_NUMBER(SUBSTR(L_isbn, 1, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 3, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 5, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 7, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 9, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 11, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 13, 1)))
+
TO_NUMBER(SUBSTR(L_isbn, 2, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 4, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 6, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 8, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 10, 1)) +
TO_NUMBER(SUBSTR(L_isbn, 12, 1))
)mod 10 )) mod 10 ) ;
IF (l_checksum = l_last_digit) THEN
RETURN 'Y';
ELSE
RETURN 'N';
END IF;
EXCEPTION
WHEN VALUE_ERROR THEN
RETURN 'N';
END fn_isbn_valid;
然后我必须使用此函数来更新表格中的指标列:
update my_table
set isbn_valid_ind = 'N'
where fn_isbn_valid(isbn) = 'N';
对于包含20k行且无效ISBN号的100k行表,更新大约需要10秒钟。
任何提示或建议我如何才能加快这个过程? 感谢。
答案 0 :(得分:2)
我正在添加这个作为答案,因此我可以格式化,但我真的想详细说明我所做的评论。
我有一张“bigemp”表,1.8M行。
20%的行的值为JOB ='TEST'
如果我运行直接的SQL语句:
update bigemp
2 set ename = lower(ename)
3 where job = 'TEST'
4 /
367001 rows updated.
Elapsed: 00:00:22.94
现在我有一个功能:
create or replace function is_valid( empno in number ) return varchar2
is
begin
if mod( empno, 5 ) = 0
then
return 'N';
else
return 'Y';
end if;
end;
现在我使用函数
运行本质上相同的SQL语句SQL> update bigemp
2 set ename = upper(ename)
3 where is_valid( empno ) = 'N'
4 /
367001 rows updated.
Elapsed: 00:00:23.99
所以这两个陈述需要大约23秒。所以问题不在于“加速功能”。
所以道德是,看看时间花在哪里。
答案 1 :(得分:1)
尝试仅更新需要测试的行:
update my_table
set isbn_valid_ind = 'N'
where isbn_valid_ind != 'N'
and fn_isbn_valid(isbn) = 'N';
另一个解决方案可能是使用NULL值初始化字段,然后只更新需要测试的行:
update my_table
set isbn_valid_ind = fn_isbn_valid(isbn)
where isbn_valid_ind is null;
但我不知道你的工作是否允许......
答案 2 :(得分:1)
也许创建这样的函数(也许我的版本比你的版本快,请测试):
CREATE OR REPLACE FUNCTION CheckSum_ISBN(isbn IN NUMBER) RETURN INTEGER DETERMINISTIC IS
res INTEGER;
BEGIN
SELECT MOD(SUM((1+2*MOD(LEVEL,2)) * SUBSTR(LPAD(isbn, 14, 0), LEVEL, 1)), 10)
INTO res
FROM dual
CONNECT BY LEVEL < 14;
RETURN res;
END CheckSum_ISBN;
然后您可以在表格中添加虚拟列,例如
ALTER TABLE my_table ADD (CHECK_SUM INTEGER GENERATED ALWAYS AS ( CheckSum_ISBN(ISBN) ) VIRTUAL);
如果需要,您可以使用与向普通列添加索引相同的方式在此列上创建索引。
CREATE INDEX ind_isbn_checksum ON my_table (CHECK_SUM);
然后验证您的号码应该相当快:
select *
from my_table
where CHECK_SUM <> to_number(substr(isbn,-1));
RESP。
update my_table
set isbn_valid_ind = 'N'
where CHECK_SUM <> to_number(substr(isbn,-1));
当然,你也可以一步到位:
CREATE OR REPLACE FUNCTION CheckSum_ISBN(isbn IN NUMBER) RETURN VARCHAR2 DETERMINISTIC IS
res VARCHAR2;
BEGIN
SELECT
CASE WHEN
MOD(SUM((1+2*MOD(LEVEL,2)) * SUBSTR(LPAD(isbn, 14, 0), LEVEL, 1)), 10) = TO_NUMBER(SUBSTR(isbn, -1)) THEN 'Y'
ELSE 'N'
END
INTO res
FROM dual
CONNECT BY LEVEL < 14;
RETURN res;
END CheckSum_ISBN;
答案 3 :(得分:1)
您的功能似乎不适用于较旧的10位ISBN,或者在输入中处理破折号。我会确定并启用并行,并使其适用于10和13位ISBN。类似的东西:
create or replace FUNCTION fn_isbn_valid (in_isbn IN VARCHAR2)
RETURN VARCHAR2
deterministic
parallel_enable
AS
l_isbn VARCHAR2(20);
l_num number;
BEGIN
l_isbn := replace(in_isbn, '-','');
if (length(l_isbn) = 10) then
l_num := (to_number(substr(l_isbn, 1, 1))*10)+
(to_number(substr(l_isbn, 2, 1))*9)+
(to_number(substr(l_isbn, 3, 1))*8)+
(to_number(substr(l_isbn, 4, 1))*7)+
(to_number(substr(l_isbn, 5, 1))*6)+
(to_number(substr(l_isbn, 6, 1))*5)+
(to_number(substr(l_isbn, 7, 1))*4)+
(to_number(substr(l_isbn, 8, 1))*3)+
(to_number(substr(l_isbn, 9, 1))*2)+
(to_number(substr(l_isbn, 10, 1))*1);
if ((l_num mod 11) = 0) then
return 'Y';
else
return 'N';
end if;
elsif (length(l_isbn) = 13) then
l_num := (to_number(substr(l_isbn, 1, 1))*1)+
(to_number(substr(l_isbn, 2, 1))*3)+
(to_number(substr(l_isbn, 3, 1))*1)+
(to_number(substr(l_isbn, 4, 1))*3)+
(to_number(substr(l_isbn, 5, 1))*1)+
(to_number(substr(l_isbn, 6, 1))*3)+
(to_number(substr(l_isbn, 7, 1))*1)+
(to_number(substr(l_isbn, 8, 1))*3)+
(to_number(substr(l_isbn, 9, 1))*1)+
(to_number(substr(l_isbn, 10, 1))*3)+
(to_number(substr(l_isbn, 11, 1))*1)+
(to_number(substr(l_isbn, 12, 1))*3)+
(to_number(substr(l_isbn, 13, 1))*1);
if ((l_num mod 10) = 0) then
return 'Y';
else
return 'N';
end if;
else
return 'N';
end if;
EXCEPTION
WHEN VALUE_ERROR THEN
RETURN 'N';
END fn_isbn_valid;
例如:
SQL> --13 digit
SQL> select fn_isbn_valid('9780306406157') from dual
FN_ISBN_VALID('9780306406157')
--------------------------------------------------------------------------------
Y
1 row selected.
SQL> select fn_isbn_valid('978-0-306-40615-7') from dual
FN_ISBN_VALID('978-0-306-40615-7')
--------------------------------------------------------------------------------
Y
1 row selected.
SQL> -- 10 digit
SQL> select fn_isbn_valid('0-306-40615-2') from dual
FN_ISBN_VALID('0-306-40615-2')
--------------------------------------------------------------------------------
Y
1 row selected.
SQL> select fn_isbn_valid('0306406152') from dual
FN_ISBN_VALID('0306406152')
--------------------------------------------------------------------------------
Y
1 row selected.
您现在也可以在更新语句中使用此函数并行使用并行DML。
DML示例:
SQL> create table test_isbn
(id number,
isbn varchar2(20),
is_valid char(1)
)
Table created.
SQL> insert into test_isbn(id, isbn)
select level as id, lpad(to_char(trunc(dbms_random.value(100000000,9999999999))), 10, 0) as isbn
from dual
connect by level <= 1000000
1000000 rows created.
SQL> commit
Commit complete.
SQL> --test without parallel dml
SQL> set timing on
SQL> update test_isbn
set is_valid = fn_isbn_valid(isbn)
1000000 rows updated.
**Elapsed: 00:00:11.55**
SQL> commit
Commit complete.
Elapsed: 00:00:00.05
SQL> --test with parallel dml
SQL> alter session enable parallel dml
Session altered.
Elapsed: 00:00:01.21
SQL> update /*+ parallel ti(10) */ test_isbn ti
set is_valid = fn_isbn_valid(isbn)
1000000 rows updated.
**Elapsed: 00:00:06.38**
SQL> commit
Commit complete.
Elapsed: 00:00:00.04
答案 4 :(得分:1)
如果你确实需要加快速度,那么计算校验位作为SQL的一部分,而不是单独的函数就可以了,例如:
UPDATE my_table
SET isbn_valid_ind = 'N'
WHERE CASE WHEN substr(LPAD(isbn, 14, 0), -1) = mod(10 - MOD (3 * (to_number(substr(LPAD(isbn, 14, 0), 1, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 3, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 5, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 7, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 9, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 11, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 13, 1)))
+ to_number(substr(LPAD(isbn, 14, 0), 2, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 4, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 6, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 8, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 10, 1)) +
to_number(substr(LPAD(isbn, 14, 0), 12, 1)), 10), 10)
THEN 'Y'
ELSE 'N'
END = 'N';
N.B。我已经采用了你的问题中提出的逻辑并稍微修改了它(N MOD M是一个PL / SQL结构;你需要使用MOD(N,M)函数)。如果您的逻辑已经改变,希望这仍然可以让您了解如何将逻辑直接合并到更新语句中。
此外,您可以使用case语句生成虚拟列(这将消除在加载发生后执行更新语句的需要),但这可能会减慢查询该列的select语句。希望除了更正数据之外不使用该列,因此虚拟列可能是您的可行选择。