在字符串中搜索特定的Word。 C#

时间:2011-02-05 11:50:52

标签: c# string

我想在字符串中搜索用户输入的特定单词,然后输出单词在文本中显示的百分比。只是想知道最好的方法是什么,如果你可以帮助我。

4 个答案:

答案 0 :(得分:3)

我建议使用String.Equals重载并指定StringComparison以获得更好的效果。

var separators = new [] { ' ', ',', '.', '?', '!', ';', ':', '\"' };
var words = sentence.Split (separators);
var matches = words.Count (w =>
    w.Equals (searchedWord, StringComparison.OrdinalIgnoreCase));
var percentage = matches / (float) words.Count;

请注意percentagefloat,例如0.5为50% 您可以使用ToString重载格式将其格式化以显示:

var formatted = percentage.ToString ("P0"); // 0.1234 => 12 %

您还可以更改格式说明符以显示小数位:

var formatted = percentage.ToString ("P2"); // 0.1234 => 12.34 %

请记住,此方法对大字符串无效,因为它为找到的每个单词创建一个字符串实例。您可能需要StringReader并逐字逐字阅读。

答案 1 :(得分:2)

最简单的方法是使用LINQ:

char[] separators = new char() {' ', ',', '.', '?', '!', ':', ';'};
var count =
    (from word In sentence.Split(separators)      // get all the words
    where word.ToLower() = searchedWord.ToLower() // find the words that match
    select word).Count();                         // count them

这仅计算单词在文本中出现的次数。您还可以计算文本中有多少单词:

var totalWords = sentence.Split(separators).Count());

然后得到百分比为:

var result = count / totalWords * 100;

答案 2 :(得分:0)

我的建议是完整的课程。

class WordCount {
    const string Symbols = ",;.:-()\t!¡¿?\"[]{}&<>+-*/=#'";

    public static string normalize(string str)
    {
        var toret = new StringBuilder();

        for(int i = 0; i < str.Length; ++i) {
            if ( Symbols.IndexOf( str[ i ] ) > -1 ) {
                toret.Append( ' ' );
            } else {
                toret.Append( char.ToLower( str[ i ] ) );
            }
        }

        return toret.ToString();
    }

    private string word;
    public string Word {
        get { return this.word; }
        set { this.word = value; }
    }

    private string str;
    public string Str {
        get { return this.str; }
    }

    private string[] words = null;
    public string[] Words {
       if ( this.words == null ) {
           this.words = this.Str.split( ' ' );
       }

       return this.words;
    }

    public WordCount(string str, string w)
    {
         this.str = ' ' + normalize( str ) + ' ';
         this.word = w;
    }

    public int Times()
    {
        return this.Times( this.Word );
    }

    public int Times(string word)
    {
        int times = 0;

        word = ' ' + word + ' ';

        int wordLength = word.Length;
        int pos = this.Str.IndexOf( word );

        while( pos > -1 ) {
            ++times;

            pos = this.Str.IndexOf( pos + wordLength, word );
        }

        return times;
    }

    public double Percentage()
    {
        return this.Percentage( this.Word );
    }

    public double Percentage(string word)
    {
        return ( this.Times( word ) / this.Words.Length );
    }
}

优点:字符串拆分被缓存,因此不存在多次应用它的危险。它包装在一个类中,因此可以轻松恢复。没有必要Linq。 希望这会有所帮助。

答案 3 :(得分:0)

// The words you want to search for
var words = new string[] { "this", "is" };

// Build a regular expresion query
var wordRegexQuery = new System.Text.StringBuilder();
wordRegexQuery.Append("\\b(");
for (var wordIndex = 0; wordIndex < words.Length; wordIndex++)
{
  wordRegexQuery.Append(words[wordIndex]);
  if (wordIndex < words.Length - 1)
  {
    wordRegexQuery.Append('|');
  }
}
wordRegexQuery.Append(")\\b");

// Find matches and return them as a string[]
var regex = new System.Text.RegularExpressions.Regex(wordRegexQuery.ToString(), RegexOptions.IgnoreCase);
var someText = var someText = "This is some text which is quite a good test of which word is used most often. Thisis isthis athisisa.";
var matches = (from Match m in regex.Matches(someText) select m.Value).ToArray();

// Display results
foreach (var word in words)
{
    var wordCount = (int)matches.Count(w => w.Equals(word, StringComparison.InvariantCultureIgnoreCase));
    Console.WriteLine("{0}: {1} ({2:f2}%)", word, wordCount, wordCount * 100f / matches.Length);
}