Question

我很抱歉，如果要求这种帮助是不好的......但我不知道还有谁会问。

我有一个阅读两个文本文件的作业，找到第一个文件中的10个最长的单词（以及它们重复的次数），这些单词在第二个文件中不存在。

我目前用File.ReadAllLines读取这两个文件，然后将它们拆分成数组，其中每个元素都是一个单词（删除了标点符号）并删除了空条目。

我必须选择符合要求的词的想法是：制作包含字符串Word和int Count的字典。然后为第一个文件的长度重复循环....首先将元素与整个字典进行比较 - 如果找到匹配，则将Count增加1.然后如果它不匹配任何字典元素 - 将给定元素与第二个文件中的每个元素通过另一个循环进行比较，如果找到匹配 - 只要找到第一个文件的下一个元素，如果它没有找到任何匹配 - 添加单词到字典，并将Count设置为1。

所以我的第一个问题是：这实际上是最有效的方法吗？（别忘了我最近才开始学习c＃而且不允许使用linq）

第二个问题：我如何使用字典，因为我发现的大多数结果都非常令人困惑，我们还没有在大学里遇到它们。

到目前为止我的代码：

    // Reading and making all the words lowercase for comparisons
    string punctuation = " ,.?!;:\"\r\n";
    string Read1 = File.ReadAllText("@\\..\\Book1.txt");
    Read1 = Read1.ToLower();
    string Read2 = File.ReadAllText("@\\..\\Book2.txt");
    Read2 = Read2.ToLower();

    //Working with the 1st file
    string[] FirstFileWords = Read1.Split(punctuation.ToCharArray());

    var temp1 = new List<string>();
    foreach (var word in FirstFileWords)
    {
        if (!string.IsNullOrEmpty(word))
            temp1.Add(word);
    }
    FirstFileWords = temp1.ToArray();

    Array.Sort(FirstFileWords, (x, y) => y.Length.CompareTo(x.Length));

    //Working with the 2nd file
    string[] SecondFileWords = Read2.Split(punctuation.ToCharArray());

    var temp2 = new List<string>();
    foreach (var word in SecondFileWords)
    {
        if (!string.IsNullOrEmpty(word))
            temp2.Add(word);
    }
    SecondFileWords = temp2.ToArray();

Answer 1

嗯，我觉得你到目前为止做得很好。在这里无法使用Linq是酷刑;）

至于性能，您应该考虑将SecondFileWords设为HashSet<string>，因为如果第二个文件中存在任何单词而不费力，这会增加查找时间。如果性能不是关键要求，我就不会在性能优化方面做得更进一步。

当然，您必须检查是否要将重复项添加到第二个列表中，因此请将当前的实施更改为：

HashSet<string> temp2 = new HashSet<string>();

foreach (var word in SecondFileWords)
{
    if (!string.IsNullOrEmpty(word) && !temp2.Contains(word))
    {
        temp2.Add(word);
    }
}

不要再将其转换回数组，这不是必需的。

这让我回到你的FirstFileWords，它也包含重复项。当顶部单词可能多次包含相同的单词时，这将导致问题。所以，让我们摆脱它们。这里更复杂，因为您需要保留信息在第一个列表中出现的频率。

所以现在让我们来点Dictionary<string, int>。 Dictionary存储查找键，作为HashSet，但另外还有一个值。我们将使用单词的键，以及包含单词在第一个列表中出现频率的数字的值。

Dictionary<string, int> temp1 = new Dictionary<string, int>();

foreach (var word in FirstFileWords)
{
    if (string.IsNullOrEmpty(word))
    {
        continue;
    }

    if (temp1.ContainsKey(word))
    {
        temp1[word]++;
    }
    else
    {
        temp1.Add(word, 1);
    }
}

现在无法对字典进行排序，这使得事情变得复杂，因为您仍然需要按字长完成排序。因此，让我们回到您的Array.Sort方法，当您不被允许使用Linq时，我认为这是一个不错的选择：

KeyValuePair<string, int>[] firstFileWordsWithCount = temp1.ToArray();
Array.Sort(firstFileWordsWithCount, (x, y) => y.Key.Length.CompareTo(x.Key.Length));

注意：您在示例中使用.ToArray()，因此我认为可以使用它。但严格来说，使用Linq恕我直言，这也会失败。

现在剩下的就是通过你的firstFileWordsWithCount数组，直到你得到HashSet temp2中不存在的10个单词。类似的东西：

int foundWords = 0;

foreach(KeyValuePair<string, int> candidate in firstFileWordsWithCount)
{
    if (!temp2.Contains(candidate.Key))
    {
        Console.WriteLine($"{candidate.Key}: {candidate.Value}");
        foundWords++;
    }

    if (foundWords >= 10)
    {
        break;
    }
}

如果有什么不清楚，请问。

Answer 2

这是您在使用词典时所获得的：

string File1 = "AMD Intel Skylake Processors Graphics Cards Nvidia Architecture Microprocessor Skylake SandyBridge KabyLake";
string File2 = "Graphics Nvidia";
Dictionary<string, int> Dic = new Dictionary<string, int>();
string[] File1Array = File1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Array.Sort(File1Array, (s1, s2) => s2.Length.CompareTo(s1.Length));
foreach (string s in File1Array)
{
    if (Dic.ContainsKey(s))
    {
        Dic[s]++;
    }
    else
    {
        Dic.Add(s, 1);
    }
}

string[] File2Array = File2.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (string s in File2Array)
{
    if (Dic.ContainsKey(s))
    {
        Dic.Remove(s);
    }
}

int i = 0;
foreach (KeyValuePair<string, int> kvp in Dic)
{
i++;
    Console.WriteLine(kvp.Key + " " + kvp.Value);
    if (i == 9)
    {
        break;
    }
}

我之前的尝试是使用LINQ，这显然是不被允许但是错过了它。

string[] Results = File1.Split(" ".ToCharArray()).Except(File2.Split(" ".ToCharArray())).OrderByDescending(s => s.Length).Take(10).ToArray();

for (int i = 0; i < Results.Length; i++)
{
    Console.WriteLine(Results[i] + " " + Regex.Matches(File1, Results[i]).Count);
}

C＃使用词典

2 个答案: