我有一个MySql表,每天都会用价格值填充。即使价格没有变化,它也会每天记录一个条目。我想删除一些重复太多的行。我希望在价格变动之前保留第一个价格和最后价格。
示例1)
id name price date
1 Product1 $6 13/07/2017
2 Product1 $6 14/07/2017
3 Product1 $6 15/07/2017
4 Product1 $7 16/07/2017
5 Product1 $6 17/07/2017
6 Product1 $6 18/07/2017
7 Product1 $6 19/07/2017
从该列表中删除ID为2和6的记录,结果如下:
id name price date
1 Product1 $6 13/07/2017
3 Product1 $6 15/07/2017
4 Product1 $7 16/07/2017
5 Product1 $6 17/07/2017
7 Product1 $6 19/07/2017
示例2)
id name price date
1 Product1 $6 13/07/2017
2 Product1 $6 14/07/2017
3 Product1 $6 15/07/2017
4 Product1 $6 16/07/2017
5 Product1 $6 17/07/2017
6 Product1 $6 18/07/2017
7 Product1 $6 19/07/2017
此处没有价格变动,因此我可以删除2到6之间的所有记录:
id name price date
1 Product1 $6 13/07/2017
7 Product1 $6 19/07/2017
Id不应该是一个增量,并且日期不会每天更新。
答案 0 :(得分:5)
您可以使用一些创意自连接逻辑来执行此操作。
想想表中的三个假设行。
因此,如果您可以执行自联接以匹配这三行,则删除行b。
DELETE b FROM MyTable AS a
JOIN MyTable AS b ON a.name=b.name AND a.price=b.price AND a.date=b.date + INTERVAL 1 DAY
JOIN MyTable AS c ON b.name=c.name AND b.price=c.price AND b.date=c.date + INTERVAL 1 DAY;
即使有多行符合行b的条件,这仍然有效。它将删除第一个,然后继续删除符合条件的后续行。
如果您使用DATE
数据类型并将日期存储为“YYYY-MM-DD”,而不是“DD-MM-YYYY”,则此方法有效。无论如何你应该这样做。
答案 1 :(得分:3)
您要删除产品名称和价格与日期加/减一天的行相同的行。
DELETE row_mid
FROM
record_table AS row_mid
JOIN record_table AS row_prev
JOIN record_table AS row_next
WHERE
row_mid.name = row_prev.name
AND row_mid.price = row_prev.price
AND row_mid.date = DATE_SUB(row_prev.date, INTERVAL 1 DAY)
AND row_mid.name = row_next.name
AND row_mid.price = row_next.price
AND row_mid.date = DATE_ADD(row_next.date, INTERVAL 1 DAY);
答案 2 :(得分:3)
你的MySQL是否足够新以支持CTE?这是我在日期安排中看到的一个非常有趣的问题。代码看起来总是很尴尬。要在没有删除的情况下检查结果,可以使用select和delete切换注释标记,并注释掉t。[Name]为空行。
WITH
cte AS (
SELECT a.ID
, a.[Name]
, a.[Date]
, a.Price
, NextDate = max(npc.[Date]) -- Next Price change
, PrevDate = max(lpc.[Date]) -- Next Price change
FROM yourTable as a -- Base Table
LEFT JOIN
yourTable as npc -- Looking for Next Price Change
ON a.[Name] = npc.[Name]
and a.[Date] < npc.[Date]
and a.Price <> npc.Price
LEFT JOIN
yourTable as lpc -- Looking for Last Price Change
ON a.[Name] = lpc.[Name]
and a.[Date] > lpc.[Date]
and a.Price <> lpc.Price
GROUP BY a.ID, a.[Name], a.[Date], a.Price
)
----SELECT f.*, [Check] = CASE WHEN t.[Name] is null THEN 'DELETE' ELSE '' END
DELETE f
FROM
yourTable as f
LEFT JOIN
(
SELECT [Name], [GoodDate] = Max([Date])
FROM cte
GROUP BY [Name], PrevDate
UNION
SELECT [Name], [GoodDate] = Min([Date])
FROM cte
GROUP BY [Name], PrevDate
UNION
SELECT [Name], [GoodDate] = Max([Date])
FROM cte
GROUP BY [Name], NextDate
UNION
SELECT [Name], [GoodDate] = Min([Date])
FROM cte
GROUP BY [Name], NextDate
) as t
ON t.[Name] = f.[Name] and t.[GoodDate] = f.[Date]
WHERE t.[Name] is null
--ORDER BY f.[Name], f.[Date]
答案 3 :(得分:3)
您可以检测prev Id
和next Id
,然后选择要删除的行:
SELECT *
FROM
(SELECT
*,
(SELECT next_id.id
FROM a next_id
WHERE next_id.id > current.id
ORDER BY next_id.id ASC LIMIT 1) as next_id,
(SELECT prev_id.id
FROM a prev_id
WHERE prev_id.id < current.id
ORDER BY prev_id.id DESC LIMIT 1) as prev_id
FROM a current) t
WHERE
EXISTS (SELECT 1
FROM a next
WHERE next.name = t.name AND t.price = next.price AND next.id=t.next_id)
AND
EXISTS (SELECT 1
FROM a prev
WHERE prev.name = t.name AND t.price = prev.price AND prev.id=t.prev_id)
我在两个示例中测试了这些查询。 Demo
<强>更新即可。如果Id
列不唯一,则逻辑必须从prev Id
+ next Id
更正为prev Date
+ next Date
。无论如何,一般概念将保持不变。查询将如下所示:
SELECT *
FROM
(SELECT
*,
(SELECT next_date.date
FROM a next_date
WHERE next_date.date > current.date AND next_date.name = current.name
ORDER BY next_date.date ASC LIMIT 1) as next_date,
(SELECT prev_date.date
FROM a prev_date
WHERE prev_date.date < current.date AND prev_date.name = current.name
ORDER BY prev_date.date DESC LIMIT 1) as prev_date
FROM a current) t
WHERE
EXISTS (SELECT 1
FROM a next
WHERE next.name = t.name AND t.price = next.price AND next.date=t.next_date)
AND
EXISTS (SELECT 1
FROM a prev
WHERE prev.name = t.name AND t.price = prev.price AND prev.date=t.prev_date)
第二次查询Demo。
答案 4 :(得分:2)
你的所有数据都会被重复,你要保留一个吗?你的解释很混乱。
您可以以相同的价格保存最旧的数据并删除其他数据:
>>> import re
>>> re.split("[ #]+", '2 #room 2.# 5 1 -1 -1')
['2', 'room', '2.', '5', '1', '-1', '-1']
答案 5 :(得分:2)
我无法为您的场景编写确切的代码,但您可以编写一个Function \ Procedure并遵循此伪代码
r = allrows
tobeDeleted = []
unique = []
for (var i=0;i<rows.length; i++){
unique.push(rows[i]->id);
dd = true;
while (dd){
if ((rows[i]->price == rows[i+1]->price) AND (rows[i]->name == rows[i+1]->price)){
tobeDeleted.push(rows[i]->id);
i++;
}else{
dd= false;
}
}
}
//tobeDeleted contains ids of rows to be deleted
//
答案 6 :(得分:2)
尝试以下查询,希望对您有帮助。
(我没有mysql,我已经尝试将语法转换为我的sql--所以如果有任何语法错误我很抱歉。)
(我已经在sqlserver上测试了它的随机日期和不同的产品,效果很好并得到你想要的结果)
/* get the data grouped by name with NewField continousDate to create continous dates for every product depends on the order of date
then save it to temporary table called tempWithContinousDate*/
CREATE TEMPORARY Table tempWithContinousDate Table (id INT,name varchar(50),price DECIMAL(12,2),date DATE,continousDate DATE)
insert into tempWithContinousDate(id,name,price,date,continousDate)
select id,name,price,date,Date_Add(minimumDate,INTERVAL rn DAY)ContinousDate
from(
select t1.id,t1.name,t1.price,t1.date,min(t2.Date)minimumDate,count(*) rn
from
(select id,name,price,date from yourTable) t1
inner join
(select id,name,price,date from yourTable) t2
on t1.name=t2.name and t1.date>=t2.date
group by t1.id,t1.name,t1.price,t1.date
) t
/* get the data grouped by name and price with NewField GroupDate to group every continous dates
then save it to temporary table called tempData*/
CREATE TEMPORARY Table tempData (id INT,name varchar(50),price DECIMAL(12,2),date DATE,groupDate DATE)
insert into tempData(id,name,price,date,groupDate)
select id,name,price,date,DATE_SUB(continousDate, INTERVAL rowNumber DAY) groupDate
from(
select t1.id,t1.name,t1.price,t1.date,t1.continousDate,count(*) rowNumber
from
(select id,name,price,date,continousDate from tempWithContinousDate) t1
inner join
(select id,name,price,date,continousDate from tempWithContinousDate) t2
on t1.name=t2.name and t1.price=t2.price and t1.date>=t2.date
group by t1.id,t1.name,t1.price,t1.date,t1.continousDate
) t
/*select * from yourTable where id in*/
delete from yourTable where id not in
(select id from
(
/* query to order every continous data asscending using the date field */
select firstData.id,firstData.name,firstData.price,firstData.date,count(*) rn
from tempData firstData
left join tempData secondData
on firstData.name=secondData.name and firstData.price=secondData.price and firstData.groupDate=secondData.groupDate
and firstData.date>=secondData.date
group by firstData.id,firstData.name,firstData.price,firstData.date
/* query to order every continous data Descending using the date field */
union all
select firstData.id,firstData.name,firstData.price,firstData.date,count(*) rn
from tempData firstData
left join tempData secondData
on firstData.name=secondData.name and firstData.price=secondData.price and firstData.groupDate=secondData.groupDate
and firstData.date<=secondData.date
group by firstData.id,firstData.name,firstData.price,firstData.date
)allData where rn=1
)
答案 7 :(得分:1)
您可以使用下面的代码。让我知道它是否有效。
DELETE FROM record_table
WHERE id NOT IN (
(SELECT MIN(id) FROM record_table GROUP BY name, price),
(SELECT MAX(id) FROM record_table GROUP BY name, price)
)
答案 8 :(得分:1)
您可以使用<div id="symbols1" innerHTML="{{holder}}"></div>
EXISTS
或DELETE FROM test t1
WHERE EXISTS
(
SELECT *
FROM test t2
WHERE t1.name = t2.name AND t1.price = t2.price AND t1.day = DATE_SUB(t2.DAY, INTERVAL 1 DAY)
) AND
EXISTS(
SELECT *
FROM test t3
WHERE t1.name = t3.name AND t1.price = t3.price AND t1.day = DATE_ADD(t3.DAY, INTERVAL 1 DAY)
)
构建以解决您的问题
IN
答案 9 :(得分:1)
您可以使用以下逻辑:
按照查询和小提琴示例:
SET @prev_value = NULL;
SET @rank_count = 0;
select distinct
`name`,
`price`,
`date`
from
(
(
select
id,
name,
price,
CASE
WHEN @prev_value = price THEN @rank_count
WHEN @prev_value := price THEN @rank_count := @rank_count + 1
END AS rank,
min(`date`) as `date`
from
`prices`
group by
`name`,
`price`,
`rank`
)
union distinct
(
select
id,
name,
price,
CASE
WHEN @prev_value = price THEN @rank_count
WHEN @prev_value := price THEN @rank_count := @rank_count + 1
END AS rank,
max(`date`) as `date`
from
`prices`
group by
`name`,
`price`,
`rank`
)
order by `id`, `date`
) as `result`
答案 10 :(得分:1)
我们必须问自己,我们何时必须删除记录?
答案:可以删除记录,
如果存在另一条记录,名称相同,价格相同,日期较早,则没有同名记录,两个日期之间有另一个价格。
和
如果存在另一条记录,名称相同,价格相同,日期较晚,则没有同名记录,两个日期之间有另一个价格。
将两个要求放入SQL中会产生以下结果:
DELETE FROM PriceTable t
WHERE
EXISTS ( SELECT *
FROM PriceTable tmp1
WHERE t.name = tmp1.name AND
t.price = tmp1.price AND
t.date > tmp1.date AND
NOT EXISTS (SELECT *
FROM PriceTable tmp2
WHERE t.name = tmp2.name AND
t.price != tmp2.price AND
t.date > tmp2.date AND
tmp1.date < tmp2.date
)
)
AND
EXISTS ( SELECT *
FROM PriceTable tmp1
WHERE t.name = tmp1.name AND
t.price = tmp1.price AND
t.date < tmp1.date AND
NOT EXISTS (SELECT *
FROM PriceTable tmp2
WHERE t.name = tmp2.name AND
t.price != tmp2.price AND
t.date < tmp2.date AND
tmp1.date > tmp2.date
)
);
答案 11 :(得分:1)
编辑:经过进一步考虑后,似乎无法用用户定义的变量技巧来解决这个问题(注意使用这些的其他解决方案)。虽然我认为以下解决方案“最有可能在99%的时间内工作”,但MySQL并不保证变量评估的顺序:link 1和link 2。
原始答案:
(我的工作假设是products.name
定义为NOT NULL
且products.id
和products.price
都不是负数[如果处理否定数据,可以提供一个简单的补丁,太])。
查询:
SET
@one_prior_id := NULL,
@one_prior_price := NULL,
@one_prior_name := NULL,
@two_prior_id := NULL,
@two_prior_price := NULL,
@two_prior_name := NULL
;
SELECT @two_prior_id AS id_to_delete
FROM (
SELECT *
FROM products
ORDER BY name, date
) AS t
WHERE IF(
(
(name = @one_prior_name)
AND
(name = @two_prior_name)
AND
(price = @one_prior_price)
AND
(price = @two_prior_price)
), (
GREATEST(
1,
IFNULL(@two_prior_id := @one_prior_id, 0),
IFNULL(@two_prior_price := @one_prior_price, 0),
LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),
IFNULL(@one_prior_id := id, 0),
IFNULL(@one_prior_price := price, 0),
LENGTH(IFNULL(@one_prior_name := name, 0))
)
), (
LEAST(
0,
IFNULL(@two_prior_id := @one_prior_id, 0),
IFNULL(@two_prior_price := @one_prior_price, 0),
LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),
IFNULL(@one_prior_id := id, 0),
IFNULL(@one_prior_price := price, 0),
LENGTH(IFNULL(@one_prior_name := name, 0))
)
)
)
查询的返回,基于您的“示例1:”
+--------------+
| id_to_delete |
+--------------+
| 2 |
| 6 |
+--------------+
查询的返回,基于您的“示例2:”
+--------------+
| id_to_delete |
+--------------+
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+--------------+
查询的工作原理:
通过ORDER BY对products
表进行简单的“分区”
循环排序的结果集,跟踪2组变量:第1组用于保存“一个先前”行的价格和名称(“前一个”行直接位于当前行的上方)第二组变量用于保存'两个先前'行('两个先前'行直接在'一个先前'行之上)。
GREATEST
和LEAST
相同,只是前者返回的值将为IF
评估为true,后者将评估为false。这些函数的真正意义在于更新循环变量。
有关子查询中变量更新的详细信息,请参阅this。
实际的DELETE:
SET
@one_prior_id := NULL,
@one_prior_price := NULL,
@one_prior_name := NULL,
@two_prior_id := NULL,
@two_prior_price := NULL,
@two_prior_name := NULL
;
DELETE FROM products WHERE id IN (
SELECT * FROM (
SELECT @two_prior_id AS id_to_delete
FROM (
SELECT *
FROM products
ORDER BY name, date
) AS t1
WHERE IF(
(
(name = @one_prior_name)
AND
(name = @two_prior_name)
AND
(price = @one_prior_price)
AND
(price = @two_prior_price)
), (
GREATEST(
1,
IFNULL(@two_prior_id := @one_prior_id, 0),
IFNULL(@two_prior_price := @one_prior_price, 0),
LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),
IFNULL(@one_prior_id := id, 0),
IFNULL(@one_prior_price := price, 0),
LENGTH(IFNULL(@one_prior_name := name, 0))
)
), (
LEAST(
0,
IFNULL(@two_prior_id := @one_prior_id, 0),
IFNULL(@two_prior_price := @one_prior_price, 0),
LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),
IFNULL(@one_prior_id := id, 0),
IFNULL(@one_prior_price := price, 0),
LENGTH(IFNULL(@one_prior_name := name, 0))
)
)
)
) AS t2
)
重要提示
看看上面的删除查询如何做2个内部选择?确保包含此内容,否则您将无意中删除最后一行!尝试在没有SELECT (...) AS t2
的情况下执行,看看我的意思。
答案 12 :(得分:1)
这是我为此问题提交的第二个答案,但我想我这次终于得到了答案:
DELETE FROM products WHERE id IN (
SELECT id_to_delete
FROM (
SELECT
t0.id AS id_to_delete,
t0.price,
(
SELECT t1.price
FROM products AS t1
WHERE (t0.date < t1.date)
AND (t0.name = t1.name)
ORDER BY t1.date ASC
LIMIT 1
) AS next_price,
(
SELECT t2.price
FROM products AS t2
WHERE (t0.date > t2.date)
AND (t0.name = t2.name)
ORDER BY t2.date DESC
LIMIT 1
) AS prev_price
FROM products AS t0
HAVING (price = next_price) AND (price = prev_price)
) AS t
)
这是@vadim_hr答案的修改版本。
编辑:下面是一个不同的查询,可以过滤JOIN
而不是子查询。对于大型数据集,JOIN
可能比前一个查询(上图)更快,但我会将性能测试留给您。
http://sqlfiddle.com/#!9/ee0655/8
SELECT M.id as id_to_delete
FROM
(
SELECT
*,
(@j := @j + 1) AS j
FROM
(SELECT * FROM products ORDER BY name ASC, date ASC) AS mmm
JOIN
(SELECT @j := 1) AS mm
) AS M -- the middle table
JOIN
(
SELECT
*,
(@i := @i + 1) AS i
FROM
(SELECT * FROM products ORDER BY name ASC, date ASC) AS lll
JOIN
(SELECT @i := 0) AS ll
) AS L -- the left table
ON M.j = L.i
AND M.name = L.name
AND M.price = L.price
JOIN
(
SELECT
*,
(@k := @k + 1) AS k
FROM
(SELECT * FROM products ORDER BY name ASC, date ASC) AS rrr
JOIN
(SELECT @k := 2) AS rr
) AS R -- the right table
ON M.j = R.k
AND M.name = R.name
AND M.price = R.price
两个查询都完成了相同的结束,并且他们都假定每个name
和date
行都是唯一的(如下面的评论中所述)。