比较大数据集中包含的唯一元素的数量

时间:2017-02-07 18:48:47

标签: ios regex swift hash set

我试图解决HackerRank's Hash Table Ransom Note挑战。有19个测试用例,由于大型数据集(10,000-30,000个条目)超时,我只传递了两个时间。

我给出了:

1)杂志中包含的一系列单词

2)赎金票据的一系列单词。我的目标是确定杂志中的单词是否可用于构建赎金票据。

我需要在magazineWords中拥有足够的唯一元素,以满足noteWords所需的数量。

我使用此代码来做出决定......而且需要永远......

for word in noteWordsSet {
    // check if there are enough unique words in magazineWords to put in the note
    if magazineWords.filter({$0==word}).count < noteWords.filter({$0==word}).count {
        return "No"
    }
}

完成此任务的更快方法是什么?

以下是我完成挑战的完整代码:

import Foundation

var magazineWords = // Array of 1 to 30,000 strings
var noteWords = // Array of 1 to 30,000 strings

enum RegexString: String {
    // Letters a to z, A to Z, 1 to 5 characters long
    case wordCanBeUsed = "([a-zA-Z]{1,5})"
}

func matches(for regexString: String, in text: String) -> [String] {
    // Hat tip MartinR for this
    do {
        let regex = try NSRegularExpression(pattern: regexString)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range)}
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

func canCreateRansomNote(from magazineWords: [String], for noteWords: [String]) -> String {
    // figure out what's unique
    let magazineWordsSet = Set(magazineWords)
    let noteWordsSet = Set(noteWords)
    let intersectingValuesSet = magazineWordsSet.intersection(noteWordsSet)

    // constraints specified in challenge
    guard magazineWords.count >= 1, noteWords.count >= 1 else { return "No" }
    guard magazineWords.count <= 30000, noteWords.count <= 30000 else { return "No" }

    // make sure there are enough individual words to work with
    guard magazineWordsSet.count >= noteWordsSet.count else { return "No" }
    guard intersectingValuesSet.count == noteWordsSet.count else { return "No" }

    // check if all the words can be used. assume the regex method works perfectly
    guard noteWords.count == matches(for: RegexString.wordCanBeUsed.rawValue, in: noteWords.joined(separator: " ")).count else { return "No" }

    // FIXME: this is a processor hog. I'm timing out when I get to this point
    // need to make sure there are enough magazine words to write the note
    // compare quantity of word in magazine with quantity of word in note
    for word in noteWordsSet {
        // check if there are enough unique words in magazineWords to put in the note
        if magazineWords.filter({$0==word}).count < noteWords.filter({$0==word}).count {
            return "No"
        }
    }

    return "Yes"
}

print(canCreateRansomNote(from: magazineWords, for: noteWords))

1 个答案:

答案 0 :(得分:1)

我不知道如何阅读比赛网站上的测试用例或允许哪些框架。如果允许使用Foundation,则可以使用NSCountedSet

import Foundation

let fileContent = try! String(contentsOf: URL(fileURLWithPath: "/path/to/file.txt"))
let scanner = Scanner(string: fileContent)

var m = 0
var n = 0
scanner.scanInt(&m)
scanner.scanInt(&n)

var magazineWords = NSCountedSet(capacity: m)
var ransomWords = NSCountedSet(capacity: n)

for i in 0..<(m+n) {
    var word: NSString? = nil
    scanner.scanUpToCharacters(from: .whitespacesAndNewlines, into: &word)

    if i < m {
        magazineWords.add(word!)
    } else {
        ransomWords.add(word!)
    }
}

var canCreate = true
for w in ransomWords {
    if ransomWords.count(for: w) > magazineWords.count(for: w) {
        canCreate = false
        break
    }
}

print(canCreate ? "Yes" : "No")

它的工作原理是一次输入一个单词的输入文件,计算该单词出现在杂志中和赎金中的次数。然后,如果赎金票据中的任何单词出现的频率高于杂志中的单词,则会立即失败。在我的iMac 2012上,在不到1秒的时间内运行30,000字的测试用例。