Question

我正在开发一款定期从服务器下载数据的应用。如果数据需要更新，我会使用类似的内容来更新记录或插入新记录（如果它们不存在）。

let fetchRequest = NSFetchRequest<NSFetchRequestResult>(entityName: "Trip")
    for csvTrip in csvTrips {
        var trip: NSManagedObject!

        let tripId = Int(csvTrip[0])!
        fetchRequest.predicate = NSPredicate(format: "id = %d", tripId)

        if (context.count(for: fetch) == 0) {
            trip = NSEntityDescription.insertNewObject(forEntityName: "Trip", into: context)
            trip.setValue(tripId, forKey: "id")
        } else {
            tripObject = (context.fetch(fetch) as! [NSManagedObject])[0]
        }

        // Set other properties
    }

检查实体是否已存在于每个循环中使得它比仅插入它们大约慢100倍而不进行检查，这对于超过几千个实体来说成为一个大问题。我已经尝试首先获取所有实体，但我仍然必须循环遍历每个实体并将id添加到数组或其他东西，这不是更快。我知道核心数据与MySQL不同，但我很难相信没有类似INSERT ...... ON DUPLICATE KEY UPDATE的功能，这在MYSQL中非常快。我错过了什么吗？

Answer 1

如果取出几千个实体并将id加载到Set需要特别长的时间，我会感到惊讶。

您可以使用以下内容：

let fetchRequest = NSFetchRequest<NSFetchRequestResult>(entityName: "Trip")
fetchRequest.resultType = .dictionaryResultType
fetchRequest.propertiesToFetch = ["id"]
do {
   if let results = try self.moc.fetch(fetchRequest) as? [[String:Any]] {
       let idSet = Set<Int32>(results.flatMap({ (dict) -> Int32? in
                return dict["id"] as? Int32
        }))
   }
 } catch {
     print("Error reading trips")
 }

现在，您可以轻松检查给定的ID是否是新的，并在需要时插入新的行程：

for csvTrip in csvTrips {
    if let tripId = Int(csvTrip[0]) {
        if !idSet.contains(tripId) {
            trip = NSEntityDescription.insertNewObject(forEntityName: "Trip", into: context)
            trip.setValue(tripId, forKey: "id")
        }
     }
}

在我的测试中，这需要1.35秒将320,000次行程ID加载到一组中，并且0.08s需要创建10,000次新行程，同时检查行程ID是否包含在集合中。

Answer 2

加速插入/更新的一种方法是将输入数组切割成相当小的“存储桶”并使用NSPredicate中的IN运算符。使用IN运算符，您可以使用单个查询检查数据库中所有元素是否已存在。让我用一些代码来说明这一点。

let bucketSize = 10

let bucketStart = 0
let bucketEnd = bucketSize

while bucketStart < csvTrips.count {
    let tripBucket = csvTrips[bucketStart..<bucketEnd]

    let fetchRequest = NSFetchRequest<NSFetchRequestResult>(entityName: "Trip")
    fetchRequest.predicate = NSPredicate(format: "id in %@", tripBucket.map {Int($0[0])})

    // count == bucketSize would imply that all elements in the bucket are also in the db, in which case we simply move on to the next bucket
    if context.count(for: fetch) != bucketSize {
        // some of the elements in the bucket are not in the db,
        // now use your existing code to update the missing ones
        for csvTrip in tripBucket {
            // ...
        }
    }

    // update bucketStart and bucketEnd here
}

您可以通过更改存储桶大小来调整此算法的效率。您必须选择一个大小，考虑输入数据中新记录的概率，以便最大值。桶不输入以下代码块。

if context.count(for: fetch) != bucketSize {...}

铲斗尺寸太大意味着几乎所有铲斗都会在数据库中缺少至少一个元件;这反过来意味着你将比现有方法获得很少甚至没有优势。另一方面，桶大小太小意味着额外提取请求（id in %@）的开销将太大。

Answer 3

您可以使用Core Data的唯一约束技术。

告诉核心数据您的 id 是唯一的标识符。为此，请选择数据模型（Trip.xcdatamodeld），并确保选择了Trip实体，而不是其属性之一。在数据模型检查器中，查看“约束”字段，然后单击该字段底部的+按钮。将出现一个新行，显示“逗号，分隔符，属性”。单击该按钮，然后按Enter以使其可编辑，然后键入 id 并再次按Enter。通过按Cmd + S保存更改。

修改loadPersistentStores()方法调用以允许Core Data更新您的对象：

container.loadPersistentStores { storeDescription, error in
    self.container.viewContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy

    if let error = error {
        print("Unresolved error \(error)")
    }
}

注意：使用属性约束可能会导致NSFetchedResultsController出现问题：属性约束仅在发生保存时才强制执行为唯一，这意味着如果您要插入数据，则NSFetchedResultsController可能包含重复项，直到进行保存。您可以通过在加载之前执行保存来避免这种情况。只是知道由您来进行此类更改即可。

您可以了解有关此技术here的更多信息。

使用Core Data插入/更新记录的最有效方法是什么？

3 个答案: