我有两个表(例如A和B)。我的任务是将B与A同步,即,将记录添加到B(如果存在于A而不是B中);并删除B中的内容,如果B中存在但A中不存在。
A和B可以具有重复记录,因此,如果记录是A中的重复记录,则B也应该具有重复记录。 A和B中的样本数据
**Table A** **Table B**
id identifier id identifier
100 capital 1001 bat
201 bat 1002 bat
202 bat 1003 bat
5010 keyboard
为此,我已经使用外部联接从A和B中获取了记录,这样我的输出看起来像:
A.id B.id identifier
100 null capital
201 1001 bat
201 1002 bat
201 1003 bat
202 1001 bat
202 1002 bat
202 1003 bat
null 5010 keyboard
因此在上述情况下,100和5010分别是添加和删除候选者,这很容易弄清楚。
问题是发现1003也是删除候选对象。因为201和202分别映射到1001和1002。
我可以在数据库中执行此操作,方法是像 MYSQL: Avoiding cartesian product of repeating records when self-joining 但是由于某些限制,我只能使用外部联接以上述格式加载数据。 因此,我需要使用JAVA中的算法来完成上述操作。 预先感谢。
答案 0 :(得分:0)
我最终想出了这个算法,它虽然不是很干净或很聪明,但似乎可以完成工作:
QRenderSettings
输出:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
class SyncAlgorithm {
static class JoinResult {
public final Integer aId;
public final Integer bId;
public final String identifier;
public JoinResult(Integer aId, Integer bId, String identifier) {
this.aId = aId;
this.bId = bId;
this.identifier = identifier;
}
}
public static void main(String[] args) {
List<JoinResult> table = makeTestTable();
System.out.println("Initial table:");
printTable(table);
System.out.println();
Iterator<JoinResult> iter = table.iterator();
// A.id values we have seen
Map<String, Set<Integer>> aSeen = new HashMap<String, Set<Integer>>();
// A.id values we have used
Map<String, Set<Integer>> aUsed = new HashMap<String, Set<Integer>>();
// B.id values we have seen
Map<String, Set<Integer>> bUsed = new HashMap<String, Set<Integer>>();
// Loop over table to remove unnecessary rows
while (iter.hasNext()) {
JoinResult row = iter.next();
// Make sure sets exist for current identifier
if (!aSeen.containsKey(row.identifier)) {
aSeen.put(row.identifier, new HashSet<Integer>());
}
if (!aUsed.containsKey(row.identifier)) {
aUsed.put(row.identifier, new HashSet<Integer>());
}
if (!bUsed.containsKey(row.identifier)) {
bUsed.put(row.identifier, new HashSet<Integer>());
}
// If there is no match in A remove
if (row.aId == null) {
iter.remove();
// If both A.id and B.id are note null
} else if (row.bId != null) {
// Mark A.id as seen
aSeen.get(row.identifier).add(row.aId);
// If A.id or B.id were already used discard row
if (aUsed.get(row.identifier).contains(row.aId) || bUsed.get(row.identifier).contains(row.bId)) {
iter.remove();
// If both ids are new mark them as used and keep the row
} else {
aUsed.get(row.identifier).add(row.aId);
bUsed.get(row.identifier).add(row.bId);
}
// If A.id is not null but B.id is null save A.id and keep the row
} else {
aSeen.get(row.identifier).add(row.aId);
aUsed.get(row.identifier).add(row.aId);
}
}
// Add A.id values without that have been seen but not used
for (Map.Entry<String, Set<Integer>> aSeenEntry : aSeen.entrySet())
{
Set<Integer> aSeenId = aSeenEntry.getValue();
aSeenId.removeAll(aUsed.get(aSeenEntry.getKey()));
for (Integer aId : aSeenId) {
table.add(new JoinResult(aId, null, aSeenEntry.getKey()));
}
}
System.out.println("Result table:");
printTable(table);
}
static List<JoinResult> makeTestTable() {
List<JoinResult> table = new ArrayList<JoinResult>();
table.add(new JoinResult(100, null, "capital"));
table.add(new JoinResult(201, 1001, "bat"));
table.add(new JoinResult(201, 1002, "bat"));
table.add(new JoinResult(201, 1003, "bat"));
table.add(new JoinResult(202, 1001, "bat"));
table.add(new JoinResult(202, 1002, "bat"));
table.add(new JoinResult(202, 1003, "bat"));
table.add(new JoinResult(null, 5010, "keyboard"));
table.add(new JoinResult(501, 3001, "foo"));
table.add(new JoinResult(502, 3001, "foo"));
return table;
}
static void printTable(List<JoinResult> table) {
System.out.println("A.id B.id identifier");
for (JoinResult row : table) {
System.out.printf("%-8d%-8d%s\n", row.aId, row.bId, row.identifier);
}
}
}
答案 1 :(得分:0)
这是我解决此问题的方法:
从表A和表B中获取数据。
表A和表B的标识符的分组数据,使用:
Map<String, SameBucketObject>
其中键为“标识符”,SameBucketObject为:
class SameBucketObject{
private List<String> idsOfA;
private List<String> idsOfB;
// getter, setters, addToList statements
}
基本上,我按标识符将表A和表B的所有元素分组。
idsOfA
的元素和B idsOfB
的元素的计数,以及 sizeOf(idsOfA) < sizeOf(idsOfB) -> add elements with ids in idsOfB List from Table B to Table A
sizeOf(idsOfA) > sizeOf(idsOfB) -> delete sizeOf(idsOfA) - sizeOf(idsOfB) elements from A from last.
sizeOf(idsOfA) = sizeOf(idsOfB) -> no action.
这种方法不占用额外的空间