Question

我正在使用几个表来处理数据库。他们是

      districts table
      PK district_id

      student_data table
      PK study_id
      FK district_id

      ga_data table
      PK study_id
      district_id

ga_data表是我正在添加的数据.school_data表和ga_data都有130万条记录。两个表之间的study_id是1比1，但是ga_data.district_id是NULL并且需要更新。我在使用以下PL / SQL时遇到问题：

update ga_data
set district_id = (select district_id from student_data
where student_data.study_id = ga_data.study_id)
where ga_data.district_id is null and rownum < 100;

我需要逐步增加，这就是为什么我需要rownum。但我正确使用它吗？在多次运行查询之后，它只更新了大约8,000条记录中的130万条记录（应该是大约110万条更新，因为在student_data中某些district_id为null）。谢谢！

Answer 1

ROWNUM只会在前n行后删除查询。您在STUDENT_DATA中有一些行，DISTRICT_ID为NULL。因此，经过多次运行后，您的查询可能会陷入困境，返回相同的100条QA_DATA记录，所有这些记录都与那些讨厌的STUDENT_DATA行匹配。

因此，您需要一些机制来确保您逐步通过QA_DATA表。标志列将是一个解决方案。对查询进行分区以使其命中另一组STUDENT_ID是另一种。

目前尚不清楚为什么必须以100个批量执行此操作，但也许最简单的方法是使用BULK PROCESSING（至少在Oracle中：此PL / SQL语法在MySQL中不起作用）。

以下是一些测试数据：

SQL> select district_id, count(*)
  2  from student_data
  3  group by district_id
  4  /

DISTRICT_ID   COUNT(*)
----------- ----------
   7369        192
   7499        190
   7521        192
   7566        190
   7654        192
   7698        191
   7782        191
   7788        191
   7839        191
   7844        192
   7876        191
   7900        192
   7902        191
   7934        192
   8060        190
   8061        193
   8083        190
   8084        193
   8085        190
   8100        193
   8101        190
               183

22 rows selected.

SQL> select district_id, count(*)
  2  from qa_data
  3  group by district_id
  4  /

DISTRICT_ID   COUNT(*)
----------- ----------
                  4200

SQL>

此匿名块使用批量处理LIMIT子句将结果集批量处理为100行的块。

SQL> declare
  2      type qa_nt is table of qa_data%rowtype;
  3      qa_recs qa_nt;
  4
  5      cursor c_qa is
  6          select qa.student_id
  7                 , s.district_id
  8          from qa_data qa
  9                  join student_data s
 10                      on (s.student_id = qa.student_id);
 11  begin
 12      open c_qa;
 13
 14      loop
 15          fetch c_qa bulk collect into qa_recs limit 100;
 16          exit when qa_recs.count() = 0;
 17
 18          for i in qa_recs.first()..qa_recs.last()
 19          loop
 20              update qa_data qt
 21                  set qt.district_id = qa_recs(i).district_id
 22                  where qt.student_id = qa_recs(i).student_id;
 23          end loop;
 24
 25      end loop;
 26  end;
 27  /

PL/SQL procedure successfully completed.

SQL>

请注意，此构造允许我们在发出更新之前对选定的行执行其他处理。如果我们需要以编程方式应用复杂的修复，这很方便。

如您所见，QA_DATA中的数据现在与STUDENT_DATA中的数据匹配

SQL> select district_id, count(*)
  2  from qa_data
  3  group by district_id
  4  /

DISTRICT_ID   COUNT(*)
----------- ----------
   7369        192
   7499        190
   7521        192
   7566        190
   7654        192
   7698        191
   7782        191
   7788        191
   7839        191
   7844        192
   7876        191
   7900        192
   7902        191
   7934        192
   8060        190
   8061        193
   8083        190
   8084        193
   8085        190
   8100        193
   8101        190
               183

22 rows selected.

SQL>

Answer 2

一次只更新100行是一项奇怪的要求。那是为什么？

无论如何，由于student_data中的district_id可以为null，因此您可能会一遍又一遍地更新相同的100行。

如果扩展查询以确保存在非null的district_id，那么最终可能会出现在您希望的位置：

update ga_data
set district_id = (
  select district_id 
  from student_data
  where student_data.study_id = ga_data.study_id
)
where ga_data.district_id is null 
and exists (
  select 1
  from student_data
  where student_data.study_id = ga_data.study_id
  and district_id is not null
)
and rownum < 100;

Answer 3

如果这是一次性转换，您应该考虑采用完全不同的方法。将表重新创建为两个表的连接。我保证当你意识到与各种有趣的100行一次更新相比有多快时，你会大笑。

create table new_table as
   select study_id
         ,s.district_id
         ,g.the_remaining_columns_in_ga_data
    from student_data s
    join ga_data      g using(study_id);

   create indexes, constraints etc 
   drop table ga_data;
   alter table new_table rename to ga_data;

或者，如果它不是一次性转换，或者您无法重新创建/删除表格，或者您只是想花费额外的时间来加载数据：

merge
 into ga_data      g
using student_data s
   on (g.study_id  = s.study_id)
when matched then
   update
      set g.district_id = s.district_id;

最后一个语句也可以重写为可更新视图，但我个人从不使用它们。

在运行合并之前删除/禁用ga_data.district_id上的索引/约束，然后重新创建它们将改善性能。

PL / SQL rownum更新

3 个答案: