如何正确使用条件分组依据

时间:2019-02-07 02:33:30

标签: sql oracle group-by aggregate-functions

我希望能够计算出各大洲按国家细分的苹果的总类型(仅限有机);包括总数,如果它们混合在一起。

例如,食品B1是来自美国的有机金苹果。因此,对于有机食品,应该有一个数字“ 1” golden_bag和“ 1”。现在,A1也是阿根廷生产的有机食品-但是,它既有奶奶也有红色美味的苹果-因此,它被视为“ 1”混合袋,“ granny_bag”被计算为“ 1”,red_bag也被视为“ 1”。

最后,E1和F1都是老挝的富士苹果,但是一个是有机的,另一个不是。因此总数为2 fuji_bag,organic_fd的总数应为1。

Table X:
food_item | food_area | food_loc   | food_exp
A1          lxgs        argentina   1/1/20
B1          iyan        usa         5/31/21
C1          lxgs        peru        4/1/20
D1          wa8e        norway      10/1/19
E1          894a        laos        5/1/19
F1          894a        laos        9/17/19


Table Y:
food_item | organic
A1          Y
B1          Y
C1          N
D1          N
E1          Y
F1          N

Table Z:
food_item | food_type
A1          189
A1          190
B1          191
C1          189
D1          192
E1          193
F1          193

SELECT continent, country,
      SUM(organic)  AS organic_fd, SUM(Granny) AS granny_bag,
      SUM(Red_delc) AS red_bag,    SUM(Golden) AS golden_bag,
      SUM(Gala)     AS gala_bag,   SUM(Fuji)   AS fuji_bag,
      SUM(CASE WHEN Granny + Red_delc + Golden + Gala + Fuji > 1 THEN 1  ELSE 0 END) AS mixed_bag     
FROM (SELECT (CASE SUBSTR (x.food_area, 4, 1)
              WHEN 's' THEN 'SA' WHEN 'n' THEN 'NA'
              WHEN 'e' THEN 'EU' WHEN 'a' THEN 'AS' ELSE NULL END) continent,
          x.food_loc country, COUNT(y.organic) AS Organic
          COUNT(CASE WHEN z.food_type = '189' THEN 1 END) AS Granny,
          COUNT(CASE WHEN z.food_type = '190' THEN 1 END) AS Red_delc,
          COUNT(CASE WHEN z.food_type = '191' THEN 1 END) AS Golden,
          COUNT(CASE WHEN z.food_type = '192' THEN 1 END) AS Gala,
          COUNT(CASE WHEN z.food_type = '193' THEN 1 END) AS Fuji      
    FROM x LEFT JOIN z ON x.food_item = z.food_item
           LEFT JOIN y on x.food_item = y.food_item and y.organic = 'Y'    
               WHERE  x.exp_date > sysdate
    GROUP BY SUBSTR (x.food_area, 4, 1), x.food_loc, y.organic) h
GROUP BY h.continent, h.country, h.organic

我没有得到正确的输出,例如,老挝将显示TWICE来说明有机计数和非有机计数。因此它将显示1 organic_fd0 organic_fd1 fuji_bag,另一行将是另外一个1 fuji_bag。我想要总计。 (此外,如果我添加更多食品,则我的blend_bag几乎每条记录/行都显示“ 1”计数)。

下面是所需的输出:

| continent | country   |organic_fd | granny_bag| red_bag| golden_bag| gala_bag|fuji_bag | mixed_bag
| SA        | argentina |    1      | 1         |   1    | 0         | 0       | 0       | 1
| SA        | peru      |    0      | 1         |   0    | 0         | 0       | 0       | 0
| NA        | usa       |    1      | 0         |   0    | 1         | 0       | 0       | 0
| EU        | norway    |    0      | 0         |   0    | 0         | 1       | 0       | 0
| AS        | laos      |    1      | 0         |   0    | 0         | 0       | 2       | 0

因此,假设我要添加另一种食品,来自挪威的G1,它具有3种有机苹果:fuji, red, granny ...那么挪威现在将有1个计数列:mixed_bagorganic_fdfuji_bagred_baggranny_bag(除了先前的1 gala_bag计数)。如果您添加的H1与G1完全相同,则以下项的总数为2mixed_bagorganic_fdfuji_bag,{ {1}},red_bag

2 个答案:

答案 0 :(得分:1)

查询:

def f(x):
    a = x - x.iloc[0]
    b = x.count()
    c = x.index - x.index[0] + 1
    return pd.DataFrame({'Diff':a, 'Count':b, 'Index':c})

df = df.join(df.groupby('name')['value'].apply(f))
print(df)

  name  value  Diff  Count  Index
0    A      1     0      2      1
1    A      3     2      2      2
2    B      1     0      4      1
3    B      2     1      4      2
4    B      3     2      4      3
5    B      1     0      4      4
6    C      2     0      3      1
7    C      3     1      3      2
8    C      3     1      3      3

您可以在此处尝试此查询:https://rextester.com/TSSH87409

答案 1 :(得分:0)

xz之间存在一对多关系,并且像A1一样,联接可能为x中的每一行产生很多行。因此,首先必须为x中的行编号,这是我的子查询t1的工作,除了映射值。然后像对子查询max()一样,对每个计数的列(奶奶,有机食品等)以t2进行分组。最后求和值。

dbfiddle demo

with
  t1 as (
    select rn, food_item, food_area, food_loc country, food_exp, food_type,
           decode(substr(food_area, 4, 1), 's', 'SA', 'n', 'NA', 'e', 'EU', 'a', 'AS') continent,
           case organic when 'Y' then 1 else 0 end org,
           case when food_type = '189' then 1 else 0 end gra,
           case when food_type = '190' then 1 else 0 end red,
           case when food_type = '191' then 1 else 0 end gol,
           case when food_type = '192' then 1 else 0 end gal,
           case when food_type = '193' then 1 else 0 end fuj 
      from (select rownum rn, x.* from x) x join y using (food_item) join z using (food_item)
      where food_exp > sysdate),
  t2 as (
    select rn, country, continent, max(org) org, max(gra) gra, 
           max(red) red, max(gol) gol, max(gal) gal, max(fuj) fuj,
           case when max(gra) + max(red) + max(gol) + max(gal) + max(fuj) > 1 
                then 1 else 0 
            end mix
       from t1 group by rn, country, continent)
select continent, country, sum(org) organic_fd, sum(gra) granny, sum(red) red_delc, 
       sum(gol) golden_bag, sum(gal) gala_bag, sum(fuj) fuji_bag, sum(mix) mixed_bag 
  from t2 
  group by continent, country

以上查询给出了预期的输出,请对其进行测试并根据需要进行调整。我注意到您使用左联接。如果对于X中的某些行来说,YZ中没有数据,则可能必须在计算中添加nvl()。也许您还应该将映射的硬编码值放入表中。对它们进行硬编码不是一个好习惯。希望这会有所帮助:)