我的单词数据库包含300 000多个单词
我想匹配长度已知的单词(例如7)并且它包含 只有某些字符,其中一些可以重复某些时间但不是全部
例如我有a,p,p,l,e,r,t,h,o
个字符,我想找到长度为5的单词
所以,它可以匹配
apple
earth
但不是
hello
因为多次指定l
^([a,p,p,l,e,r,t,h,o]{1}) # capture first char
(!/1 [a,p,p,l,e,r,t,h,o]{1}) # capture second char but without firstly captured symbol
(!/1 !/2 [a,p,p,l,e,r,t,h,o]{1}) # capture third char but without first and second captured symbol
and so on ...
答案 0 :(得分:2)
尝试以下正则表达式:
\b(?!\w*([alertho])\w*\1)(?!\w*([p])(\w*\2){2})[aplertho]{5}\b
详细说明:
\b
- 字边界(开放)。(?!\w*([alertho])\w*\1)
- 否定前瞻,测试超过1
发生上述字符):
(?!\w*([p])(\w*\2){2})
- 否定前瞻,测试发生的次数
超过2次。
像以前一样,但这一次:
[aplertho]{5}
- 我们在寻找什么 - 任何允许的字符,
5次出现。\b
- 字边界(关闭)。答案 1 :(得分:0)
我知道这不是问题的正则表达式解决方案,但有时正则表达式不是解决方案。
public class WordChecker
{
public WordChecker(params char[] letters)
{
Counters = letters.GroupBy(c => c).ToDictionary(g => g.Key, g => new Counter(g.Count()));
}
public WordChecker(string letters) : this(letters.ToArray())
{
}
public bool CheckWord(string word)
{
Initialize();
foreach (var c in word)
{
Counter counter;
if (!Counters.TryGetValue(c, out counter)) return false;
if (!counter.Add()) return false;
}
return true;
}
private void Initialize()
{
foreach (var counter in Counters)
counter.Value.Initialize();
}
private Dictionary<char, Counter> Counters;
private class Counter
{
public Counter(int maxCount)
{
MaxCount = maxCount;
Count = 0;
}
public void Initialize()
{
Count = 0;
}
public bool Add()
{
Count++;
return Count <= MaxCount;
}
public int MaxCount { get; private set; }
public int Count { get; private set; }
}
}
使用它的方式是这样的:
WordChecker checker = new WordChecker("applertho");
List<string> words = new List<string>(){"apple", "giraf", "earth", "hello"};
foreach (var word in words)
if (checker.CheckWord(word))
{
// The word is valid!
}