如何在C#中逐字逐句迭代?

时间:2009-09-18 07:02:48

标签: c#

我想逐字逐句地遍历字符串。

如果我有一个字符串“incidentno和fintype或unitno”,我想逐一阅读每个单词“incidentno”,“and”,“fintype”,“or”和“unitno”。

10 个答案:

答案 0 :(得分:16)

foreach (string word in "incidentno and fintype or unitno".Split(' ')) {
   ...
}

答案 1 :(得分:13)

var regex = new Regex(@"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

即使您的文字之间有“.,; tabs and new lines”,这也有效。

答案 2 :(得分:11)

我知道稍微扭曲,但您可以将迭代器块定义为字符串上的扩展方法。 e.g。

    /// <summary>
    /// Sweep over text
    /// </summary>
    /// <param name="Text"></param>
    /// <returns></returns>
    public static IEnumerable<string> WordList(this string Text)
    {
        int cIndex = 0;
        int nIndex;
        while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1)
        {
            int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
            yield return Text.Substring(sIndex, nIndex - sIndex);
            cIndex = nIndex;
        }
        yield return Text.Substring(cIndex + 1);
    }

        foreach (string word in "incidentno and fintype or unitno".WordList())
            System.Console.WriteLine("'" + word + "'");

其优点是不为长字符串创建大数组。

答案 3 :(得分:4)

使用字符串类的Split方法

string[] words = "incidentno and fintype or unitno".Split(" ");

这将分隔空格,因此“单词”将具有[incidentno,and,fintype,or,unitno]

答案 4 :(得分:3)

假设单词总是以空格分隔,您可以使用String.Split()来获取单词数组。

答案 5 :(得分:2)

有多种方法可以实现这一目标。两种最方便的方法(在我看来)是:

  • 使用string.Split()创建数组。我可能会使用这种方法,因为它是最不言自明的。

示例:

string startingSentence = "incidentno and fintype or unitno";
string[] seperatedWords = startingSentence.Split(' ');

或者,您可以使用(这是我将使用的):

string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);

StringSplitOptions.RemoveEmptyEntries将删除数组中由于额外的空格和其他小问题而可能出现的所有空条目。

下一步 - 要处理单词,您可以使用:

foreach (string word in seperatedWords)
{
//Do something
}
  • 或者,您可以使用正则表达式来解决此问题,如Darin demonstrated(副本如下)。

示例:

var regex = new Regex(@"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

对于处理,您可以使用与第一个选项类似的代码。

foreach (string word in words)
{
//Do something
}

当然,有很多方法可以解决这个问题,但我认为这两个方法最容易实现和维护。我会选择第一个选项(使用string.Split()),因为正则表达式有时会变得非常混乱,而分裂在大多数情况下都能正常运行。

答案 6 :(得分:1)

使用split时,检查空条目怎么样?

string sentence =  "incidentno and fintype or unitno"
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string word in words)
{
// Process
}

编辑:

我无法发表评论,所以我在这里发帖,但是(上面发布的)有效:

foreach (string word in "incidentno and fintype or unitno".Split(' ')) 
{
   ...
}

我对foreach的理解是它首先执行GetEnumerator()和calles .MoveNext,直到返回false。因此,.Split不会在每次迭代时重新评估

答案 7 :(得分:0)

public static string[] MyTest(string inword, string regstr)
{
    var regex = new Regex(regstr); 
    var phrase = "incidentno and fintype or unitno";
    var words = regex.Split(phrase);  
    return words;
}

? MyTest(“incidentno,and .fintype-或;:unitno”,@“[^ \ w +]”)

[0]: "incidentno"
[1]: "and"
[2]: "fintype"
[3]: "or"
[4]: "unitno"

答案 8 :(得分:0)

我想在JDunkerley的awnser中添加一些信息 如果您提供字符串或char参数来搜索,则可以轻松地使此方法更可靠。

public static IEnumerable<string> WordList(this string Text,string Word)
        {
            int cIndex = 0;
            int nIndex;
            while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1)
            {
                int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
                yield return Text.Substring(sIndex, nIndex - sIndex);
                cIndex = nIndex;
            }
            yield return Text.Substring(cIndex + 1);
        }

public static IEnumerable<string> WordList(this string Text, char c)
        {
            int cIndex = 0;
            int nIndex;
            while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1)
            {
                int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
                yield return Text.Substring(sIndex, nIndex - sIndex);
                cIndex = nIndex;
            }
            yield return Text.Substring(cIndex + 1);
        }

答案 9 :(得分:-1)

我写了一个字符串处理器类。你可以使用它。

  

示例:

metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();
  

类别:

 public static class StringProcessor
{
    private static List<String> PrepositionList;

    public static string ToNormalString(this string strText)
    {
        if (String.IsNullOrEmpty(strText)) return String.Empty;
        char chNormalKaf = (char)1603;
        char chNormalYah = (char)1610;
        char chNonNormalKaf = (char)1705;
        char chNonNormalYah = (char)1740;
        string result = strText.Replace(chNonNormalKaf, chNormalKaf);
        result = result.Replace(chNonNormalYah, chNormalYah);
        return result;
    }

    public static List<KeyValuePair<String, Int32>> Process(this String bodyText,
        List<String> blackListWords = null,
        int minimumWordLength = 3,
        char splitor = ' ',
        bool perWordIsLowerCase = true)
    {
        string[] btArray = bodyText.ToNormalString().Split(splitor);
        long numberOfWords = btArray.LongLength;
        Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1);
        foreach (string word in btArray)
        {
            if (word != null)
            {
                string lowerWord = word;
                if (perWordIsLowerCase)
                    lowerWord = word.ToLower();
                var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "")
                    .Replace("?", "").Replace("!", "").Replace(",", "")
                    .Replace("<br>", "").Replace(":", "").Replace(";", "")
                    .Replace("،", "").Replace("-", "").Replace("\n", "").Trim();
                if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords)))
                {
                    if (wordsDic.ContainsKey(normalWord))
                    {
                        var cnt = wordsDic[normalWord];
                        wordsDic[normalWord] = ++cnt;
                    }
                    else
                    {
                        wordsDic.Add(normalWord, 1);
                    }
                }
            }
        }
        List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList();
        return keywords;
    }

    public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true)
    {
        List<KeyValuePair<String, Int32>> result = null;
        if (isBasedOnFrequency)
            result = list.OrderByDescending(q => q.Value).ToList();
        else
            result = list.OrderByDescending(q => q.Key).ToList();
        return result;
    }

    public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10)
    {
        List<KeyValuePair<String, Int32>> result = list.Take(n).ToList();
        return result;
    }

    public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list)
    {
        List<String> result = new List<String>();
        foreach (var item in list)
        {
            result.Add(item.Key);
        }
        return result;
    }

    public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list)
    {
        List<Int32> result = new List<Int32>();
        foreach (var item in list)
        {
            result.Add(item.Value);
        }
        return result;
    }

    public static String AsString<T>(this List<T> list, string seprator = ", ")
    {
        String result = string.Empty;
        foreach (var item in list)
        {
            result += string.Format("{0}{1}", item, seprator);
        }
        return result;
    }

    private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords)
    {
        bool result = false;
        if (blackListWords == null) return false;
        foreach (var w in blackListWords)
        {
            if (w.ToNormalString().Equals(word))
            {
                result = true;
                break;
            }
        }
        return result;
    }
}