我想在字符串中搜索用户输入的特定单词,然后输出单词在文本中显示的百分比。只是想知道最好的方法是什么,如果你可以帮助我。
答案 0 :(得分:3)
我建议使用String.Equals
重载并指定StringComparison
以获得更好的效果。
var separators = new [] { ' ', ',', '.', '?', '!', ';', ':', '\"' };
var words = sentence.Split (separators);
var matches = words.Count (w =>
w.Equals (searchedWord, StringComparison.OrdinalIgnoreCase));
var percentage = matches / (float) words.Count;
请注意percentage
为float
,例如0.5
为50%
您可以使用ToString
重载格式将其格式化以显示:
var formatted = percentage.ToString ("P0"); // 0.1234 => 12 %
您还可以更改格式说明符以显示小数位:
var formatted = percentage.ToString ("P2"); // 0.1234 => 12.34 %
请记住,此方法对大字符串无效,因为它为找到的每个单词创建一个字符串实例。您可能需要StringReader
并逐字逐字阅读。
答案 1 :(得分:2)
最简单的方法是使用LINQ:
char[] separators = new char() {' ', ',', '.', '?', '!', ':', ';'};
var count =
(from word In sentence.Split(separators) // get all the words
where word.ToLower() = searchedWord.ToLower() // find the words that match
select word).Count(); // count them
这仅计算单词在文本中出现的次数。您还可以计算文本中有多少单词:
var totalWords = sentence.Split(separators).Count());
然后得到百分比为:
var result = count / totalWords * 100;
答案 2 :(得分:0)
我的建议是完整的课程。
class WordCount {
const string Symbols = ",;.:-()\t!¡¿?\"[]{}&<>+-*/=#'";
public static string normalize(string str)
{
var toret = new StringBuilder();
for(int i = 0; i < str.Length; ++i) {
if ( Symbols.IndexOf( str[ i ] ) > -1 ) {
toret.Append( ' ' );
} else {
toret.Append( char.ToLower( str[ i ] ) );
}
}
return toret.ToString();
}
private string word;
public string Word {
get { return this.word; }
set { this.word = value; }
}
private string str;
public string Str {
get { return this.str; }
}
private string[] words = null;
public string[] Words {
if ( this.words == null ) {
this.words = this.Str.split( ' ' );
}
return this.words;
}
public WordCount(string str, string w)
{
this.str = ' ' + normalize( str ) + ' ';
this.word = w;
}
public int Times()
{
return this.Times( this.Word );
}
public int Times(string word)
{
int times = 0;
word = ' ' + word + ' ';
int wordLength = word.Length;
int pos = this.Str.IndexOf( word );
while( pos > -1 ) {
++times;
pos = this.Str.IndexOf( pos + wordLength, word );
}
return times;
}
public double Percentage()
{
return this.Percentage( this.Word );
}
public double Percentage(string word)
{
return ( this.Times( word ) / this.Words.Length );
}
}
优点:字符串拆分被缓存,因此不存在多次应用它的危险。它包装在一个类中,因此可以轻松恢复。没有必要Linq。 希望这会有所帮助。
答案 3 :(得分:0)
// The words you want to search for
var words = new string[] { "this", "is" };
// Build a regular expresion query
var wordRegexQuery = new System.Text.StringBuilder();
wordRegexQuery.Append("\\b(");
for (var wordIndex = 0; wordIndex < words.Length; wordIndex++)
{
wordRegexQuery.Append(words[wordIndex]);
if (wordIndex < words.Length - 1)
{
wordRegexQuery.Append('|');
}
}
wordRegexQuery.Append(")\\b");
// Find matches and return them as a string[]
var regex = new System.Text.RegularExpressions.Regex(wordRegexQuery.ToString(), RegexOptions.IgnoreCase);
var someText = var someText = "This is some text which is quite a good test of which word is used most often. Thisis isthis athisisa.";
var matches = (from Match m in regex.Matches(someText) select m.Value).ToArray();
// Display results
foreach (var word in words)
{
var wordCount = (int)matches.Count(w => w.Equals(word, StringComparison.InvariantCultureIgnoreCase));
Console.WriteLine("{0}: {1} ({2:f2}%)", word, wordCount, wordCount * 100f / matches.Length);
}