我有一个包含数千个元素的数组,其中许多元素与其他元素重复。我需要的是一种在数组中找到'foo'元素计数的方法,如果小于'n',则从数组中删除'foo'的所有元素。
我需要的例子
string[] words = new string[]
int n = 8;
int k = Occurances of "foo" in words;
if (k < n) {
//Remove all occurances of 'foo' in the array
}
如果数组'words'中的起始元素是
{"foo","foo","foo","foo","foo","foo","foo","bar","bar","bar","bar","bar","bar","bar","bar","bar"}
结果将是数组中的左侧,因为只发现了7次“foo”,但发现了9次“bar”的发生
{"bar","bar","bar","bar","bar","bar","bar","bar","bar"}
感谢任何帮助
答案 0 :(得分:3)
您可以使用LINQ
GroupBy
和Count
来实现这一目标:
string[] words = new string[] { "foo", "foo", "foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar", "bar", "bar", "bar", "bar", "bar" };
int n = 8;
var groups = words.GroupBy(x => x).Where(g => g.Count() >= n);
你在这里做的是按元素值(foo组和条形图组)对元素进行分组,然后计算每个组,得到元素数大于特定阈值的组(在你的情况下n = 8)< / p>
要恢复数组,您可以使用SelectMany
:
string[] filteredWords = words.GroupBy(x => x).Where(g => g.Count() >= n)
.SelectMany(g => g).ToArray();
答案 1 :(得分:1)
这样可以保留元素的原始顺序。
var words = new[]
{
"foo", "foo", "foo", "foo", "foo",
"foo", "foo", "bar", "bar", "bar",
"bar", "bar", "bar", "bar", "bar",
"bar"
};
var keepers = new HashSet<string>(
words.ToLookup(x => x).Where(x => x.Skip(7).Any()).Select(x => x.Key));
words = words.Where(w => keepers.Contains(w)).ToArray();
如果订单不重要,那么这样做:
words =
words
.ToLookup(x => x)
.Where(x => x.Skip(7).Any())
.SelectMany(x => x)
.ToArray();
根据你的评论,“是否有可能进一步扩展这一点,并检查字符串部分的出现?”,我认为你的意思是你要计算“单词”部分的个别频率如果满足频率要求,请保留整个“字”。这可能不太清楚。这是我的代码:
var words = new[]
{
"foo", "foo", "foo extrabits", "foo", "foo",
"foo", "foo", "bar", "bar", "bar",
"bar", "bar", "bar extrabits", "bar", "bar",
"bar"
};
var keepers =
new HashSet<string>(
words
.SelectMany(x => x.Split(' '))
.ToLookup(x => x)
.Where(x => x.Skip(7).Any())
.Select(x => x.Key));
words =
words
.Where(x => x.Split(' ').Any(y => keepers.Contains(y)))
.ToArray();
这会产生:
bar bar bar bar bar bar extrabits bar bar bar