Question

我有两个字符串：第一个值是“catdog”，第二个值是“got”。

我正试图找一个正则表达式告诉我“得到”的字母是否在“catdog”中。我特别希望避免出现重复字母的情况。例如，我知道“得到”是匹配，但“gott”不匹配，因为“catdog”中没有两个“t”。

编辑：

根据Adam在下面的回答，这是我在我的解决方案中工作的C＃代码。感谢所有回复者。

注意：我必须将char转换为int并减去97以获取数组的相应索引。在我的情况下，字母总是小写。

    private bool CompareParts(string a, string b)
    {

        int[] count1 = new int[26];
        int[] count2 = new int[26];

        foreach (var item in a.ToCharArray())
            count1[(int)item - 97]++;

        foreach (var item in b.ToCharArray())
            count2[(int)item - 97]++;

        for (int i = 0; i < count1.Length; i++)
            if(count2[i] > count1[i])
                return false;

        return true;
    }

Answer 1

你正在使用错误的工具来完成工作。这不是正则表达式能够轻松处理的东西。幸运的是，没有正则表达式，这样做相对容易。您只需计算两个字符串中每个字母的出现次数，并比较两个字符串之间的计数 - 如果对于字母表中的每个字母，第一个字符串中的计数至少与第二个字符串中的计数一样大，那么你的标准就满足了。由于您没有指定语言，因此这里是伪代码的答案，应该可以轻松翻译成您的语言：

bool containsParts(string1, string2)
{
    count1 = array of 26 0's
    count2 = array of 26 0's

    // Note: be sure to check for an ignore non-alphabetic characters,
    // and do case conversion if you want to do it case-insensitively
    for each character c in string1:
        count1[c]++
    for each character c in string2:
        count2[c]++

    for each character c in 'a'...'z':
        if count1[c] < count2[c]:
            return false

    return true
}

Answer 2

以前的建议已经提出，也许正则表达式不是最好的方法，但我同意，但是，你接受的答案有点冗长，考虑到你想要实现的目标，那就是测试一下字母组是另一组字母的子集。

考虑以下代码，在一行代码中实现此目的：

MatchString.ToList().ForEach(Item => Input.Remove(Item));

可以使用如下：

public bool IsSubSetOf(string InputString, string MatchString) 
{
  var InputChars = InputString.ToList(); 
  MatchString.ToList().ForEach(Item => InputChars.Remove(Item)); 
  return InputChars.Count == 0;
}

然后，您可以调用此方法来验证它是否是子集。

有趣的是，“got”将返回一个没有项目的列表，因为匹配字符串中的每个项目只出现一次，但“gott”将返回一个包含单个项目的列表，因为只有一个调用从列表中删除“t”。因此，您将在列表中留下一个项目。也就是说，“gott”不是“catdog”的子集，而是“got”是。

您可以更进一步，将该方法放入静态类中：

using System;
using System.Linq;
using System.Runtime.CompilerServices;

static class extensions
{
    public static bool IsSubSetOf(this string InputString, string MatchString)
    {
        var InputChars = InputString.ToList();
        MatchString.ToList().ForEach(Item => InputChars.Remove(Item));
        return InputChars.Count == 0;
    }
}

这使得你的方法成为字符串对象的扩展，从长远来看，这实际上使得更容易，因为你现在可以像这样进行调用：

Console.WriteLine("gott".IsSubSetOf("catdog"));

Answer 3

您想要一个与这些字母完全匹配的字符串，只需一次。这取决于你正在编写的正则表达式，但它将类似于

^[^got]*(g|o|t)[^got]$

如果你有一个“完全匹配”的操作员，那将有所帮助。

Answer 4

我认为使用正则表达式有一种理智的方法。疯狂的方法是写出所有的排列：

/^(c?a?t?d?o?g?|c?a?t?d?g?o?| ... )$/

现在，通过一些小技巧你可以通过一些正则表达式来实现这一点（例如在Perl中，未经测试）：

$foo = 'got';
$foo =~ s/c//;
$foo =~ s/a//;
...
$foo =~ s/d//;
# if $foo is now empty, it passes the test.

当然，Sane人会使用循环：

$foo = 'got'
foreach $l (split(//, 'catdog') {
    $foo =~ s/$l//;
}
# if $foo is now empty, it passes the test.

当然，还有更好的方法来解决这个问题，但他们不使用正则表达式。毫无疑问，例如，您可以使用Perl的扩展正则表达式功能，例如嵌入式代码。

Answer 5

查理·马丁几乎没错，但你必须为每封信做一个完整的传球。你可以使用一个正则表达式，通过使用前瞻符号来完成除最后一遍之外的所有操作：

/^
 (?=[^got]*g[^got]*$)
 (?=[^got]*o[^got]*$)
 [^got]*t[^got]*
$/x

这对于磨练你的正则表达式技巧是一个很好的练习，但如果我必须在现实生活中这样做，我不会这样做。非正则表达式方法需要更多的输入，但任何最低限度的程序员都能够理解和维护它。如果你使用正则表达式，那个假设的维护者也必须在正则表达式上具有超过最低限度的能力。

Answer 6

@Adam Rosenfield的Python解决方案：

from collections import defaultdict

def count(iterable):
    c = defaultdict(int)
    for hashable in iterable:
        c[hashable] += 1
    return c

def can_spell(word, astring):
    """Whether `word` can be spelled using `astring`'s characters."""

    count_string = count(astring)
    count_word   = count(word)

    return all(count_string[c] >= count_word[c] for c in word)

Answer 7

使用正则表达式的最佳方法是，IMO：

一个。对大字符串（搜索空间）中的字符进行排序因此：将“catdog”变成“acdgot”

B中。

对搜索字符的字符串执行相同操作：“gott”变为，呃，“gott”...
在每个字符之间插入“.*”
使用后者作为正则表达式搜索前者。

例如，一些Perl代码（如果你不介意的话）：

$main = "catdog"; $search = "gott";
# break into individual characters, sort, and reconcatenate
$main = join '', sort split //, $main;
$regexp = join ".*", sort split //, $search;
print "Debug info: search in '$main' for /$regexp/ \n";
if($main =~ /$regexp/) {
    print "Found a match!\n";
} else {
    print "Sorry, no match...\n";
}

打印：

Debug info: search in 'acdgot' for /g.*o.*t.*t/
Sorry, no match...

放下一个“t”就可以得到一个匹配。

用于在另一个内查找字符串部分的正则表达式

7 个答案: