我设法让上面的代码工作,但是我收到以下错误。 尝试使用谷歌搜索,我有点理解它的数据类型问题。 但是,如果我更改上述两个函数的数据类型,我将得到相同的错误。 我该怎么办?
*在这种情况下尝试计算词汇密度指数。
//For counting unique words
private void UniqueWordCount(string fbStatus)
{
int count = 0;
var countedWordList = new List<string>(100);
var reg = new Regex(@"\w+");
foreach (Match match in reg.Matches(fbStatus))
{
string word = match.Value.ToLower();
if (!countedWordList.Contains(word))
{
++count;
countedWordList.Add(word);
}
}
label_totaluniquewords.Text = count.ToString();
}
//For counting total words
private void SplitWords(string fbStatus)
{
int splitWords = fbStatus.Split(new char[] { ' ' },StringSplitOptions.RemoveEmptyEntries).Count();
label_totalwordcount.Text = splitWords.ToString();
}
//For counting lexical density (trying to make this work...)
private void CalculateLexicalDensity(string fbStatus)
{
int ld = 0;
ld = (UniqueWordCount(fbStatus) / SplitWords(fbStatus)) * 100;
label_lexicaldensity.Text = ld.ToString();
}
答案 0 :(得分:4)
SplitWords
不会返回它计算的值。如果您打算退回计数,请添加
return splitWords;
在函数的末尾,并将其声明为int
:
private int SplitWords(string fbStatus)
{
int splitWords = fbStatus.Split(new char[] { ' ' },StringSplitOptions.RemoveEmptyEntries).Count();
label_totalwordcount.Text = splitWords.ToString();
return splitWords;
}
但请注意,由于整数除法,您的百分比计算可能会被取消。在应用除法之前,您应该返回decimal
或强制转换为decimal
。
您还可以更改操作顺序
ld = 100 * UniqueWordCount(fbStatus) / SplitWords(fbStatus);
将整数结果截断为最高整数百分比。
答案 1 :(得分:2)
将代码更改为:
//For counting unique words
private int UniqueWordCount(string fbStatus)
{
int count = 0;
var countedWordList = new List<string>(100);
var reg = new Regex(@"\w+");
foreach (Match match in reg.Matches(fbStatus))
{
string word = match.Value.ToLower();
if (!countedWordList.Contains(word))
{
++count;
countedWordList.Add(word);
}
}
label_totaluniquewords.Text = count.ToString();
return count;
}
private int SplitWords(string fbStatus)
{
int splitWords = fbStatus.Split(new char[] { ' ' },StringSplitOptions.RemoveEmptyEntries).Count();
label_totalwordcount.Text = splitWords.ToString();
return splitWords;
}
//For counting lexical density (trying to make this work...)
private void CalculateLexicalDensity(string fbStatus)
{
decimal ld = 0;
ld = ((decimal)UniqueWordCount(fbStatus) / (decimal)SplitWords(fbStatus)) * 100;
label_lexicaldensity.Text = ld.ToString();
}
答案 2 :(得分:1)
由于UniqueCount和SplitWords都将处理从原始文本中提取的单词列表,因此为此创建一个函数是有意义的。
此方法接受包含您要使用的文本的字符串,并返回包含其所含单词的字符串数组。
private string[] GetWords(string text)
{
return text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
}
用于计算唯一单词:
private int UniqueCount(string[] words)
{
var foundWords = new List<string>();
foreach (var word in words)
{
string word = word.ToLower();
if (!foundWords.Contains(word))
{
foundWords.Add(word);
}
}
return foundWords.Length;
}
计算总词数:
private int Count(string[] words)
{
return words.Length;
}
对于词汇密度:
private double CalculateLexicalDensity(string[] words)
{
return ((double)UniqueCount(words) / (double)Count(words));
}
注意:这些都不会更新标签,我想将这个问题分成antoher方法。
此方法调用其他方法并更新标签
注意:我坚信fbStatus应该是一个参数。
private void UpdateLabels(string fbStatus)
{
var words = GetWords(fbStatus);
label_totalwordcount = Count(words).ToString();
label_totaluniquewords.Text = UniqueCount(words).ToString();
label_lexicaldensity = (CalculateLexicalDensity(words) * 100).ToString() + "%";
}
为此我们有几个选择:
<强> 4.A。再次关注问题:
在这种情况下,我将方法CalculateLexicalDensity融合到UpdateLabels中,这样我就可以避免同时执行两次UniqueCount和Count。
private void UpdateLabels(string fbStatus)
{
var words = GetWords(fbStatus);
int wordCount = Count(words);
int uniqueWordCount = UniqueWordCount(words);
double lexicalDensity = ((double)uniqueWordCount / (double)wordCount);
label_totalwordcount = wordCount.ToString();
label_totaluniquewords.Text = uniqueWordCount.ToString();
label_lexicaldensity = (lexicalDensity * 100).ToString() + "%";
}
<强> 4.B。使用元组作为返回类型:
在这种情况下,我将Count,UniqueCount和CalculateLexicalDensity融合到一个方法中,这将允许 - 再次 - 避免执行两次UniqueCount和Count。由于此方法需要返回三个值,它将返回一个元组[它也可以是一个自定义类型]。
private UpdateLabels(string fbStatus)
{
var words = GetWords(fbStatus);
var info = Process(words);
label_totalwordcount = info.Item1.ToString();
label_totaluniquewords.Text = info.Item2.ToString();
label_lexicaldensity = (info.Item3 * 100).ToString() + "%";
}
private Tuple<int, int, double> Process(string[] words)
{
int wordCount = Count(words);
int uniqueWordCount = UniqueWordCount(words);
double lexicalDensity = ((double)uniqueWordCount / (double)wordCount);
return new Tuple<int, int, double>(wordCount, uniqueWordCount, lexicalDensity);
}
由于此选项将问题分开,我更喜欢这个。然而,在您不能(或者您不想)使用Tuple的情况下,您可以使用自定义类型...对于这种情况,我更喜欢结构...
<强> 4.C。使用结构作为返回类型:
struct LexicalInfo
{
public int WordCount;
public int UniqueWordCount;
public int LexicalDensity;
}
使用此结构,代码为:
private UpdateLabels(string fbStatus)
{
var words = GetWords(fbStatus);
var info = Process(words);
label_totalwordcount = info.WordCount.ToString();
label_totaluniquewords.Text = info.UniqueWordCount.ToString();
label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}
private LexicalInfo Process(string[] words)
{
int wordCount = Count(words);
int uniqueWordCount = UniqueWordCount(words);
double lexicalDensity = ((double)uniqueWordCount / (double)wordCount);
return new LexicalInfo()
{
WordCount = wordCount,
UniqueWordCount = uniqueWordCount,
LexicalDensity = lexicalDensity
};
}
如果我们是goint使用结构......
<强> 4.D。使用结构进行计算:
注意:在这种情况下,它可能是一个类。
struct LexicalInfo
{
private int wordCount;
private int uniqueWordCount;
public LexicalInfo(string text)
{
var words = GetWords(text);
wordCount = Count(words);
uniqueWordCount = UniqueCount(words);
}
private string[] GetWords(string text)
{
return text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
}
private int UniqueCount(string[] words)
{
var foundWords = new List<string>();
foreach (var word in words)
{
string word = word.ToLower();
if (!foundWords.Contains(word))
{
foundWords.Add(word);
}
}
return foundWords.Length;
}
private int Count(string[] words)
{
return words.Length;
}
public int WordCount
{
get
{
return wordCount;
}
}
public int UniqueWordCount
{
get
{
return uniqueWordCount;
}
}
public double LexicalDensity
{
get
{
return ((double)uniqueWordCount / (double)wordCount);
}
}
}
//----
private UpdateLabels(string fbStatus)
{
var info = new LexicalInfo(words);
label_totalwordcount = info.WordCount.ToString();
label_totaluniquewords.Text = info.UniqueWordCount.ToString();
label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}
我将使用最终代码(使用结构进行计算的代码),并对其进行处理。
我们有两个方法只有一行(方法是GetWords和Count),我将摆脱它们并用方法体替换调用:
struct LexicalInfo
{
private int wordCount;
private int uniqueWordCount;
public LexicalInfo(string text)
{
var words = text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
wordCount = words.Length;
uniqueWordCount = UniqueCount(words);
}
private int UniqueCount(string[] words)
{
var foundWords = new List<string>();
foreach (var word in words)
{
string word = word.ToLower();
if (!foundWords.Contains(word))
{
foundWords.Add(word);
}
}
return foundWords.Length;
}
public int WordCount
{
get
{
return wordCount;
}
}
public int UniqueWordCount
{
get
{
return uniqueWordCount;
}
}
public double LexicalDensity
{
get
{
return ((double)uniqueWordCount / (double)wordCount);
}
}
}
//----
private UpdateLabels(string fbStatus)
{
var info = new LexicalInfo(words);
label_totalwordcount = info.WordCount.ToString();
label_totaluniquewords.Text = info.UniqueWordCount.ToString();
label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}
如果我们可以使用Linq,我们可以用一行代替UniqueCount:
struct LexicalInfo
{
private int wordCount;
private int uniqueWordCount;
public LexicalInfo(string text)
{
var words = text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
wordCount = words.Length;
uniqueWordCount = words.Distinct().Count();
}
public int WordCount
{
get
{
return wordCount;
}
}
public int UniqueWordCount
{
get
{
return uniqueWordCount;
}
}
public double LexicalDensity
{
get
{
return ((double)uniqueWordCount / (double)wordCount);
}
}
}
//----
private UpdateLabels(string fbStatus)
{
var info = new LexicalInfo(fbStatus);
label_totalwordcount = info.WordCount.ToString();
label_totaluniquewords.Text = info.UniqueWordCount.ToString();
label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}
我使用以下文字进行测试:
ESTE ES UN TEXTO QUE HE ESCRITOENESPAÑOL。 ESTE TEXTO FUE ESCRITOPARADEMOSTRACIÓN。 ESTE TEXTO REPITE ALGUNAS DE SUS PALABRAS Y ALGUNAS OTRAS NO。
La salida fue:
WordCount = 28
UniqueWordCount = 21
LexicalDensity = 75%
然而,检查代码显示我们正在计算标点符号作为单词的一部分(即代码将ESPAÑOL
和ESPAÑOL.
视为两个不同的单词,因为标点符号)。
您可以使用正则表达式进行快速修复,以便用此替换LexicalInfo的构造函数:
public LexicalInfo(string text)
{
var words = from match in (new Regex(@"\w+")).Matches(text).Cast<Match>() select match.Value;
wordCount = words.Count();
uniqueWordCount = words.Distinct().Count();
Console.WriteLine(words.Distinct().ToArray());
}
更改后输出为:
WordCount = 28
UniqueWordCount = 20
LexicalDensity = 71.4285714285714%
您可能希望格式化LexicalDensity,例如更改以下行:
label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
对此:
label_lexicaldensity = string.Format("{0:P2}", info.LexicalDensity);
会产生这个:
WordCount = 28
UniqueWordCount = 20
LexicalDensity = 71.43 %
注意:使用string.Format会受到执行它的Culture的影响。如果你不想改变文化,你可以指定一个,对于instace InvariantCulture:
label_lexicaldensity = string.Format("{0:P2}", info.LexicalDensity, CultureInfo.InvariantCulture);
使用另一个测试文本,我发现我已经失去了检测上限的能力。文字是
Este es otro texto escritoenespañol,el objetivo de este texto es probarlasmallúsculasalrepetir texto。
在这种情况下,代码会将Este
和este
视为两个不同的字词。这是Linq的另一个简单修复,更改此行:
uniqueWordCount = words.Distinct().Count();
对此:
uniqueWordCount = (from word in words select word.ToLower()).Distinct().Count();