输入:
item loc month year qty_name qty_value
a x 8 2020 chocolate 10
a x 8 2020 gum 15
a x 8 2020 maggi 11
a x 8 2020 colgate 18
b y 8 2020 chocolate 20
b y 8 2020 gum 30
b y 8 2020 maggi 40
b y 8 2020 colgate 9
c s 8 2020 gum 15
c s 8 2020 maggi 11
c s 8 2020 colgate 18
预期输出:
item loc month year qty_name qty_value
a x 8 2020 chocolate 10
a x 8 2020 gum 15
a x 8 2020 maggi 0
a x 8 2020 colgate 0
b y 8 2020 chocolate 20
b y 8 2020 gum 30
b y 8 2020 maggi 0
b y 8 2020 colgate 0
c s 8 2020 gum 15
c s 8 2020 maggi 11
c s 8 2020 colgate 18
说明:
对于item
,loc
,month
,year
组合:
如果chocolate>0
,则除了巧克力和口香糖外,其他所有值都将变为0(这发生在itam和b中)
并且如果不存在巧克力,那么值将保持不变(这在item = c和loc = s中是封闭的)
答案 0 :(得分:0)
如果使用的是mysql 8或更高版本,则可以使用窗口函数。在这里COUNT() OVER()
对另一列中的巧克力进行计数,并使其所有行的值相同。然后在上层查询中可以检查结果。
SELECT ITEM,
LOC,
MONTH,
YEAR,
QTY_NAME,
CASE
WHEN QTY_NAME NOT IN ('chocolate', 'gum') AND CNT > 0 THEN 0
ELSE QTY_NAME
END
QTY_NAME
FROM ( SELECT ITEM,
LOC,
MONTH,
YEAR,
QTY_NAME,
QTY_VALUE,
COUNT (CASE WHEN QTY_NAME = 'chocolate' THEN 1 ELSE NULL END)
OVER ()
CNT
FROM TEST_TABLE
GROUP BY ITEM,
LOC,
MONTH,
YEAR,
QTY_NAME,
QTY_VALUE)
答案 1 :(得分:0)
下面的解决方案假设在给定的item
,loc
,month
,year
组合中没有多个“ chocolate”记录。与样本数据一样。有了这个假设,就不需要对每个组合进行汇总。
仅将所有记录更新为零数量,这些数量不是“ chocolate”或“ gum”,对于相同组合存在记录且“ chocolate”的数量大于0。
样本数据
create table quantities
(
item nvarchar(1),
loc nvarchar(1),
month int,
year int,
qty_name nvarchar(10),
qty_value int
);
insert into quantities (item, loc, month, year, qty_name, qty_value) values
('a', 'x', 8, 2020, 'chocolate', 10),
('a', 'x', 8, 2020, 'gum' , 15),
('a', 'x', 8, 2020, 'maggi' , 11),
('a', 'x', 8, 2020, 'colgate' , 18),
('b', 'y', 8, 2020, 'chocolate', 20),
('b', 'y', 8, 2020, 'gum' , 30),
('b', 'y', 8, 2020, 'maggi' , 40),
('b', 'y', 8, 2020, 'colgate' , 9),
('c', 's', 8, 2020, 'gum' , 15),
('c', 's', 8, 2020, 'maggi' , 11),
('c', 's', 8, 2020, 'colgate' , 18);
解决方案
update quantities q
join quantities q2
on q2.item = q.item
and q2.loc = q.loc
and q2.month = q.month
and q2.year = q.year
and q2.qty_name = 'chocolate'
and q2.qty_value > 0
set q.qty_value = 0
where q.qty_name not in ('chocolate', 'gum');
结果
select * from quantities;
item loc month year qty_name qty_value
------- --- ------- ------- ----------- ----------
a x 8 2020 chocolate 10
a x 8 2020 gum 15
a x 8 2020 maggi 0
a x 8 2020 colgate 0
b y 8 2020 chocolate 20
b y 8 2020 gum 30
b y 8 2020 maggi 0
b y 8 2020 colgate 0
c s 8 2020 gum 15
c s 8 2020 maggi 11
c s 8 2020 colgate 18
EDIT:这是一个MySql解决方案,因为该问题先前已用它进行了标记。我手头没有Apache Spark SQL引擎来验证此解决方案。
答案 2 :(得分:0)
这是pyspark方式。
import pyspark.sql.functions as f
df2 = df.filter("qty_name = 'chocolate' and qty_value > 0").select('item', 'loc', 'month', 'year').withColumn('marker', f.lit('Y'))
df.join(df2, ['item', 'loc', 'month', 'year'], 'left') \
.withColumn('qty_value', f.when(f.expr("marker = 'Y' and qty_name not in ('chocolate', 'gum')"), 0).otherwise(f.col('qty_value'))) \
.drop('marker').show(12, False)
+----+---+-----+----+---------+---------+
|item|loc|month|year|qty_name |qty_value|
+----+---+-----+----+---------+---------+
|a |x |8 |2020|chocolate|10 |
|a |x |8 |2020|gum |15 |
|a |x |8 |2020|maggi |0 |
|a |x |8 |2020|colgate |0 |
|b |y |8 |2020|chocolate|20 |
|b |y |8 |2020|gum |30 |
|b |y |8 |2020|maggi |0 |
|b |y |8 |2020|colgate |0 |
|c |s |8 |2020|gum |15 |
|c |s |8 |2020|maggi |11 |
|c |s |8 |2020|colgate |18 |
+----+---+-----+----+---------+---------+