分割文字并将其放入字典中

时间:2018-11-07 18:28:37

标签: c#

我有600个单词的文字,应该删除所有引号,数字(年,日期,..),数字,...。我应该只有单词,并且必须放入词典中。

因此,我尝试遍历每个循环并获取第一个字母并将其保存在列表中。然后,我将每一行分割成一个字。 例如:

You are pretty.
You
are
pretty

问题是连续有多个单词,它们仍然相同,但不应该相同。我已经尝试修复它,但是找不到任何解决方案。

onOpen(e)

3 个答案:

答案 0 :(得分:1)

您可以使用正则表达式进行拆分,然后使用LINQ创建字典:

var dictionary = Regex.Split(text, @"\W+")
    .GroupBy(m => m, StringComparer.OrdinalIgnoreCase) // Case-insensitive
    .ToDictionary(m => m.Key, m => m.Count());

更新

在应用示例代码时,您的任务类可能会变成这样,以构建两个字典(并考虑不区分大小写):

public class Aufgabe
{
    const string ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜ";
    public Dictionary<string, int> words;
    public Dictionary<char, List<string>> firstletter;
    public Aufgabe(string filename)
    {
        var text = File.ReadAllText(filename);
        words = Regex.Split(text, @"\W+")
            .GroupBy(m => m, StringComparer.OrdinalIgnoreCase)
            .ToDictionary(m => m.Key, m => m.Count());
        firstletter = ALPHABET.ToDictionary(a => a, // First-letter key
            a => words.Keys.Where(m => a == char.ToUpper(m[0])).ToList()); // Words
    }
}

答案 1 :(得分:1)

其他答案缺少一些东西:

  • 不进行验证以检查文本是否为单词
  • 比较不应该区分大小写(即spainSpainSPAIN应该被视为同一个单词)

我的解决方案:

StringComparer comparer = StringComparer.OrdinalIgnoreCase;
string text = "The 'rain' in spain falls mainly on the plain. 07 November 2018 20:02:07 - 20180520 I said the Plain in SPAIN. 12345";

var dictionary = Regex.Split(text, @"\W+")
                      .Where(IsValidWord)
                      .GroupBy(m => m, comparer)
                      .ToDictionary(m => m.Key, m => m.Count(), comparer);

方法IsValidWord

// logic to validate word goes here
private static bool IsValidWord(string text)
{
    double value;

    bool isNumeric = double.TryParse(text, out value);

    // add more validation rules here

    return !isNumeric;
}

编辑

我在您的代码中注意到您有一个带有words grouped by first letter的字典。可以这样实现(使用上一个字典):

var lettersDictionary = dictionary.Keys.GroupBy(x => x.Substring(0, 1), 
        (alphabet, subList) => new {
            Alphabet = alphabet,
            SubList = subList.OrderBy(x => x, comparer).ToList()
        })
        .ToDictionary(m => m.Alphabet, m => m.SubList, comparer);

答案 2 :(得分:0)

这是Regex的一种方法,请注意,大小写敏感性尚未得到解决

var text = "The 'rain' in spain falls mainly on the plain. I said the plain in spain";

var result = new Dictionary<string,string>();

Regex.Matches(text, @"[^\s]+")
     .OfType<Match>()
     .Select(m => Regex.Replace(m.Value, @"\W", string.Empty))
     .ToList()
     .ForEach(word =>
     {
        if (!result.ContainsKey(word))
            result.Add(word, word);
     });

结果

enter image description here