我想计算值在某些列中出现的频率并创建一个新表,其中值为列,频率为数据。例如:
create table users
(id number primary key,
name varchar2(255));
insert into users values (1, 'John');
insert into users values (2, 'Joe');
insert into users values (3, 'Max');
create table meals
(id number primary key,
user_id number,
food varchar2(255));
insert into meals values (1, 1, 'Apple');
insert into meals values (2, 1, 'Apple');
insert into meals values (3, 1, 'Orange');
insert into meals values (4, 1, 'Bread');
insert into meals values (5, 1, 'Apple');
insert into meals values (6, 2, 'Apple');
insert into meals values (7, 2, 'Bread');
insert into meals values (8, 2, 'Bread');
insert into meals values (9, 2, 'Apple');
insert into meals values (10, 3, 'Orange');
insert into meals values (11, 3, 'Bread');
insert into meals values (12, 3, 'Bread');
所以我得到了不同的用户和他们的饭菜(这里是面包,苹果和橘子)。对于每个用户,我想知道他多久吃一次不同的食物。以下查询完全符合我的要求:
select
(select count(id) from meals where meals.user_id = users.id and meals.food = 'Apple') as count_apple,
(select count(id) from meals where meals.user_id = users.id and meals.food = 'Orange') as count_orange,
(select count(id) from meals where meals.user_id = users.id and meals.food = 'Bread') as count_bread
from users;
问题是,这真的很慢,特别是当我有超过100,000个用户和几十种不同的食物时。我确信有一种更快的方法,但我在SQL中没有足够的经验来解决这个问题。
答案 0 :(得分:1)
如果您使用的是11g,则可以使用pivot
运算符,如下所示:
select * from (
select user_id, food from meals
)
pivot (count(*) as count for (food) in ('Apple', 'Orange', 'Bread'));
否则你必须做一个手动支点:
select user_id,
sum(case when food = 'Apple' then 1 else 0 end) count_apple,
sum(case when food = 'Orange' then 1 else 0 end) count_orange,
sum(case when food = 'Bread' then 1 else 0 end) count_bread
from meals
group by user_id
在任何一种情况下,这些都应该比原始方法更快,因为您只需访问meals
表一次。