我试图在这个SO问题中模拟接受的答案:Delete all Duplicate Rows except for One in MySQL? [duplicate]有一个扭曲,我想要一个表的数据(自动递增ID)来确定要在另一个表中删除哪些行。 SQLFiddle here showing data.
在上面提到的小提琴中,我正在寻找的最终结果是 eventdetails_new
中的行,其中Event_ID = 4& 6被删除(EVENTDETAILS_ID的5& 6和9& 10),留下第3行和第3行。 5(EVENTDETAILS_ID的3& 4和7& 8)。我希望这是有道理的。 理想情况 events_new
中的行与那些相同的Event_ID也会被删除(我还没有开始处理,所以没有代码示例)。< / p>
这是我正在尝试工作的查询,但我有点过头了:
SELECT *
FROM eventdetails_new AS EDN1, eventdetails_new AS EDN2
INNER JOIN events_new AS E1 ON `E1`.`Event_ID` = `EDN1`.`Event_ID`
INNER JOIN events_new AS E2 ON `E2`.`Event_ID` = `EDN2`.`Event_ID`
WHERE `E1`.`Event_ID` > `E2`.`Event_ID`
AND `E1`.`DateTime` = `E2`.`DateTime`
AND events_new.EventType_ID = 6;
这是SQLFiddle with the results of this query。不好。我可以在数据中看到Event_ID,但查询不能出于某种原因。不知道如何解决这个问题。
我知道这是一个SELECT查询,但是我无法找到在DELETE查询中使用两个别名表的方法(我认为我需要?)。我想如果我能得到一个选择,我可以用一些C#代码删除它。但理想情况下,它可以在一个查询或一组语句中完成,而不必离开MySQL。
这是我在查询中的第一次剪切,但它同样糟糕:
DELETE e1 FROM eventdetails_new e1
WHERE `events_new`.`Event_ID` > `events_new`.`Event_ID`
AND events_new.DateTime = events_new.DateTime AND events_new.EventType_ID = 6;
SQLFiddle根本不会让我运行这个查询,所以它没有多大帮助。但是,它给出了与上面相同的错误:Error Code: 1054. Unknown column 'events_new.Event_ID' in 'where clause'
如果有更好的方法,我绝不会与这些查询中的任何一个结婚。我正在寻找的最终结果是删除一堆重复的数据。
我有成千上万的这些结果,我知道在我们使用数据库之前,大约有三分之一是我需要删除的重复项。
答案 0 :(得分:0)
这是我最终做的事情。我的同事&amp;我想出了一个查询,它会给我们一个包含重复数据的Event_ID列表(我们实际上使用了Access 2010的查询构建器和MySQL-ified它)。请记住,这是一个完整的解决方案,其中原始问题没有链接表那样详细。如果您对此有疑问,请随时询问&amp;我会尽力帮忙:
SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`, `Events_new`.`EventType_ID`, `Events_new`.`DateTime`, `Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1) AND ((COUNT(`Events_new`.`User`)) > 1) AND ((COUNT(`Events_new`.`DateTime`)) > 1))
由此我处理了每个Event_ID
以迭代方式删除重复项。基本上我不得不删除从最后一个最低表开始的所有子行,这样我就不会违反外键限制。
这段代码是在LinqPAD中用C#语句编写的:(sbCommonFunctions是一个内部DLL,旨在使大多数(但不是全部,如你所见)数据库函数以相同的方式处理或更容易)
sbCommonFunctions.Database testDB = new sbCommonFunctions.Database();
testDB.Connect("production", "database", "user", "password");
List<string> listEventIDs = new List<string>();
List<string> listEventDetailIDs = new List<string>();
List<string> listTestInformationIDs = new List<string>();
List<string> listTestStepIDs = new List<string>();
List<string> listMeasurementIDs = new List<string>();
string dtQuery = (String.Format(@"SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`,
`Events_new`.`EventType_ID`,
`Events_new`.`DateTime`,
`Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1)
AND ((COUNT(`Events_new`.`User`)) > 1)
AND ((COUNT(`Events_new`.`DateTime`)) > 1))"));
int iterations = 0;
DataTable dtEventIDs = getDT(dtQuery, testDB);
while (dtEventIDs.Rows.Count > 0)
{
Console.WriteLine(dtEventIDs.Rows.Count);
Console.WriteLine(iterations);
iterations++;
foreach(DataRowView eventID in dtEventIDs.DefaultView)
{
listEventIDs.Add(eventID.Row[0].ToString());
DataTable dtEventDetails = testDB.QueryDatabase(String.Format(
"SELECT * FROM EventDetails_new WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvEventDetail in dtEventDetails.DefaultView)
{
listEventDetailIDs.Add(drvEventDetail.Row[0].ToString());
}
DataTable dtTestInformation = testDB.QueryDatabase(String.Format(
@"SELECT TestInformation_ID
FROM TestInformation_new
WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvTest in dtTestInformation.DefaultView)
{
listTestInformationIDs.Add(drvTest.Row[0].ToString());
DataTable dtTestSteps = testDB.QueryDatabase(String.Format(
@"SELECT TestSteps_ID
FROM TestSteps_new
WHERE TestInformation_TestInformation_ID = {0}",
drvTest.Row[0]));
foreach(DataRowView drvTestStep in dtTestSteps.DefaultView)
{
listTestStepIDs.Add(drvTestStep.Row[0].ToString());
DataTable dtMeasurements = testDB.QueryDatabase(String.Format(
@"SELECT Measurements_ID
FROM Measurements_new
WHERE TestSteps_TestSteps_ID = {0}",
drvTestStep.Row[0]));
foreach(DataRowView drvMeasurements in dtMeasurements.DefaultView)
{
listMeasurementIDs.Add(drvMeasurements.Row[0].ToString());
}
}
}
}
testDB.Disconnect();
string mysqlConnection =
"server=server;\ndatabase=database;\npassword=password;\nUser ID=user;";
MySqlConnection connection = new MySqlConnection(mysqlConnection);
connection.Open();
//start unwinding the duplicates from the lowest level upward
whackDuplicates(listMeasurementIDs, "measurements_new", "Measurements_ID", connection);
whackDuplicates(listTestStepIDs, "teststeps_new", "TestSteps_ID", connection);
whackDuplicates(listTestInformationIDs, "testinformation_new", "testInformation_ID", connection);
whackDuplicates(listEventDetailIDs, "eventdetails_new", "eventdetails_ID", connection);
whackDuplicates(listEventIDs, "events_new", "event_ID", connection);
connection.Close();
//update iterator from inside the clause in case there are more duplicates.
dtEventIDs = getDT(dtQuery, testDB); }
}//goofy curly brace to allow LinqPAD to deal with inline classes
public void whackDuplicates(List<string> listOfIDs,
string table,
string pkID,
MySqlConnection connection)
{
foreach(string ID in listOfIDs)
{
MySqlCommand command = connection.CreateCommand();
command.CommandText = String.Format(
"DELETE FROM " + table + " WHERE " + pkID + " = {0}", ID);
command.ExecuteNonQuery();
}
}
public DataTable getDT(string query, sbCommonFunctions.Database db)
{
return db.QueryDatabase(query);
//}/*this is deliberate, LinqPAD has a weird way of dealing with inline
classes and the last one can't have a closing curly brace (and the
first one has to have an extra opening curly brace above it, go figure)
*/
基本上这是一个巨大的while循环,子句迭代器从子句内部更新,直到Event_ID的数量下降到零(需要5次迭代,一些数据有多达6次重复)