Question

我编写了代码来解决以下问题，但是在最后两个测试用例中失败了。我用来解决问题的逻辑听起来很合理，即使在同事审查之后，我们都无法弄清为什么它对前8个测试用例有效，但对后两个（随机生成的）无效。

问题：

给出一个字符串，返回输入字符串在一个字符串中的位置按字母顺序排列的所有可能排列的列表该字符串中的字符。例如，ABAB的排列是 [AABB，ABAB，ABBA，BAAB，BABA，BBAA]其中ABAB的位置清单是2。

我用来解决问题的逻辑：

对于较大的输入，不可能（效率不高）生成排列列表，因此重点是找到位置而不生成字母列表。这可以通过找到字符的频率来完成。对于上面的示例，ABAB中的第一个字符为A，因此在= 0之前和之后= .5，以及= 6之间，因此对于minn为1且maxx为3，将max减小0.5 * 6，即3。，只留下[AABB，ABAB，ABBA]，以A作为第一个字符的烫发！然后剩下的字符是BAB。 minn = 1且maxx = 3，且介于= 3之间。因此，对于B，minn为2且maxx为3时，将minn增大3 * .33，使minn等于3，等于[ABAB ，ABBA]以AB作为前两个字符的权限！继续对每个字符执行此操作，它将在列表中找到输入。

我的代码：

## Imports
import operator
from collections import Counter
from math import factorial
from functools import reduce
## Main function, returns list position
def listPosition(word):
    #turns string into list of numbers, A being 1, B being 2, and so on
    val = [ord(char) - 96 for char in word.lower()]
    #the result has to be between 1 and the number of permutations
    minn = 1
    maxx = npermutations(word)
    #so we just increase the min and decrease the max based on the sum of freq
    #of the characters less than and greater than each character
    for indx in range(len(word)):
            between = (maxx+1-minn)
            before,after = sumfreq(val[indx:],val[indx])
            minn = minn + int(round((between * before),0))
            maxx = maxx - int(between * after)
    return maxx #or minn, doesn't matter. they're equal at this point

## returns the number of permutations for the string (this works)
def npermutations(word):
    num = factorial(len(word))
    mults = Counter(word).values()
    den = reduce(operator.mul, (factorial(v) for v in mults), 1)
    return int(num / den)
## returns frequency as a percent for the character in the list of chars
def frequency(val,value):
    f = [val.count(i)/len(val) for i in val]
    indx = val.index(value)
    return f[indx]
#returns sum of frequencies for all chars < (before) and > (after) the said char
def sumfreq(val,value):
    before = [frequency(val,i) for i in [i for i in set(val) if i < value]]
    after = [frequency(val,i) for i in [i for i in set(val) if i > value]]
    return sum(before),sum(after)

tests= ['A','ABAB','AAAB','BAAA','QUESTION','BOOKKEEPER','ABCABC','IMMUNOELECTROPHORETICALLY','ERATXOVFEXRCVW','GIZVEMHQWRLTBGESTZAHMHFBL']
print(listPosition(tests[0]),"should equal 1")
print(listPosition(tests[1]),"should equal 2")
print(listPosition(tests[2]),"should equal 1")
print(listPosition(tests[3]),"should equal 4")
print(listPosition(tests[4]),"should equal 24572")
print(listPosition(tests[5]),"should equal 10743")
print(listPosition(tests[6]),"should equal 13")
print(listPosition(tests[7]),"should equal 718393983731145698173")
print(listPosition(tests[8]),"should equal 1083087583") #off by one digit?
print(listPosition(tests[9]),"should equal 5587060423395426613071") #off by a lot?

Answer 1

您可以使用仅需要整数算术运算的逻辑。首先，按字典顺序创建第一个排列：

BOOKKEEPER  ->  BEEEKKOOPR

然后，对于每个字母，您可以计算将其移至其位置所用的唯一排列数。由于第一个字母B已经存在，我们可以忽略它，然后查看其余字母：

B EEEKKOOPR  (first)
B OOKKEEPER  (target)

要知道将O置于最前面需要进行多少排列，我们计算在E前面，然后在K前面有多少个唯一排列：

E+EEKKOOPR -> 8! / (2! * 2! * 2!) = 40320 /  8 = 5040
K+EEEKOOPR -> 8! / (3! * 2!)      = 40320 / 12 = 3360

其中8是要排列的字母数，而2和3是字母的倍数数。因此，经过8400个排列后，我们位于：

BO EEEKKOPR

现在，我们再次计算将第二个O置于最前面所需的排列：

E+EEKKOPR -> 7! / (2! * 2!) = 5040 / 4 = 1260
K+EEEKOPR -> 7! / (3!)      = 5040 / 6 =  840

所以经过10500个排列后，我们位于：

BOO EEEKKPR

然后，我们计算将K置于最前面需要进行多少排列：

E+EEKKPR -> 6! / (2! * 2!) = 720 / 4 = 180

所以经过10680个排列后，我们位于：

BOOK EEEKPR

然后，我们计算将第二个K置于最前面需要进行多少排列：

E+EEKPR -> 5! / 2! = 120 / 2 = 60

所以经过10740个排列后，我们位于：

BOOKK EEEPR

接下来的两个字母已经到位，因此我们可以跳至：

BOOKKEE EPR

然后我们计算将P放在最前面需要多少排列：

E+PR -> 2! = 2

因此，经过10742个排列后，我们位于：

BOOKKEEP ER

最后两个字母也已经按顺序排列，所以答案是10743（添加1，因为要求从1开始的索引）。

Answer 2

@rici指出这是一个浮点错误（请参见Is floating point math broken?）。幸运的是python有fractions。

明智地使用fractions.Fraction可以解决此问题，而无需更改代码正文，例如：

from fractions import Fraction
...
## returns the number of permutations for the string (this works)
def npermutations(word):
    num = factorial(len(word))
    mults = Counter(word).values()
    den = reduce(operator.mul, (factorial(v) for v in mults), 1)
    return int(Fraction(num, den))
## returns frequency as a percent for the character in the list of chars
def frequency(val,value):
    f = [Fraction(val.count(i),len(val)) for i in val]
    indx = val.index(value)
    return f[indx]
...

In []:
print(listPosition(tests[0]),"should equal 1")
print(listPosition(tests[1]),"should equal 2")
print(listPosition(tests[2]),"should equal 1")
print(listPosition(tests[3]),"should equal 4")
print(listPosition(tests[4]),"should equal 24572")
print(listPosition(tests[5]),"should equal 10743")
print(listPosition(tests[6]),"should equal 13")
print(listPosition(tests[7]),"should equal 718393983731145698173")
print(listPosition(tests[8]),"should equal 1083087583")
print(listPosition(tests[9]),"should equal 5587060423395426613071")

Out[]:
1 should equal 1
2 should equal 2
1 should equal 1
4 should equal 4
24572 should equal 24572
10743 should equal 10743
13 should equal 13
718393983731145698173 should equal 718393983731145698173
1083087583 should equal 1083087583
5587060423395426613071 should equal 5587060423395426613071

已更新

基于@ m69的出色解释，这是一个更简单的实现：

from math import factorial
from collections import Counter
from functools import reduce
from operator import mul

def position(word):
    charset = Counter(word)
    pos = 1    # Per OP 1 index
    for letter in word:
        chars = sorted(charset)
        for char in chars[:chars.index(letter)]:
            ns = Counter(charset) - Counter([char])
            pos += factorial(sum(ns.values())) // reduce(mul, map(factorial, ns.values()))
        charset -= Counter([letter])
    return pos

给出与上面相同的结果：

In []:
tests = ['A', 'ABAB', 'AAAB', 'BAAA', 'QUESTION', 'BOOKKEEPER', 'ABCABC',
         'IMMUNOELECTROPHORETICALLY', 'ERATXOVFEXRCVW', 'GIZVEMHQWRLTBGESTZAHMHFBL']
print(position(tests[0]),"should equal 1")
print(position(tests[1]),"should equal 2")
print(position(tests[2]),"should equal 1")
print(position(tests[3]),"should equal 4")
print(position(tests[4]),"should equal 24572")
print(position(tests[5]),"should equal 10743")
print(position(tests[6]),"should equal 13")
print(position(tests[7]),"should equal 718393983731145698173")
print(position(tests[8]),"should equal 1083087583")
print(position(tests[9]),"should equal 5587060423395426613071")

Out[]:
1 should equal 1
2 should equal 2
1 should equal 1
4 should equal 4
24572 should equal 24572
10743 should equal 10743
13 should equal 13
718393983731145698173 should equal 718393983731145698173
1083087583 should equal 1083087583
5587060423395426613071 should equal 5587060423395426613071

Python函数，用于在字符的字母顺序列表中按字母顺序排列字符串的位置

问题：

我用来解决问题的逻辑：

我的代码：

2 个答案:

已更新