我正在研究CountVector类,并遇到以下问题:
private static int[] FindMostPopularElements(int[] inArray)
{
var result = new List<int>();
var numbersAndTimes = inArray
.GroupBy(g => g)
.Select(x => new
{
number = x.Key,
times = x.Count()
})
.OrderByDescending(o => o.times);
int maxTimes = 0;
foreach (var pair in numbersAndTimes)
{
if (pair.times >= maxTimes)
{
maxTimes = pair.times;
result.Add(pair.number);
//Console.WriteLine($"{pair.number} - Occurrences: {pair.times}");
}
else
{
break;
}
}
return result.ToArray();
}
该空间从哪里来?索引0、1、17、2等。
如果我更改为:
In [45]: ngram_vec = CountVectorizer(analyzer='char_wb', ngram_range=(1,3))
In [46]: counts = ngram_vec.fit_transform(['words', 'wprds'])
In [47]: ngram_vec.vocabulary_
Out[47]:
{' ': 0,
'w': 18,
'o': 7,
'r': 13,
'd': 4,
's': 16,
' w': 1,
'wo': 19,
'or': 8,
'rd': 14,
'ds': 5,
's ': 17,
' wo': 2,
'wor': 20,
'ord': 9,
'rds': 15,
'ds ': 6,
'p': 10,
'wp': 21,
'pr': 11,
' wp': 3,
'wpr': 22,
'prd': 12}
声音结果变为:
ngram_range=(3,3)
这是什么意思:
{' wo': 0,
'wor': 6,
'ord': 3,
'rds': 5,
'ds ': 2,
' wp': 1,
'wpr': 7,
'prd': 4}