在python中生成字符串的不同组合

时间:2016-05-24 19:56:49

标签: python string algorithm

我正在开展一个项目并坚持下面我正在讨论的问题。 我有权限集。 P0,P1等权限可以是个人,如P0,P1,P2等,也可以是P1P2,P3P4P5,P1P7P11P34 ......权限被视为字符串,我们按排序顺序(基于长度)。这是一个可能的输入字符串序列:     P1     P2     P4     P23     P0P1     P3P5     P8P13P45P67     ..........     .......

现在我的工作是查看是否可以通过较小字符串的某些组合形成每个较长的字符串。我正在做的是我将字符串插入一个trie并检查是否可以使用前一个字符串创建更大的字符串。如果是,我什么都不做;否则,我把最新的字符串放入trie中。

我遇到的问题是,随着字符串长度的增加,我必须对字符串进行排列/组合,并检查字符串是否存在。现在经典的字符串排列将不起作用,因为首先我必须将所有排列组合在一起(不是一个一个)来检查它们是否在trie中。置换是特定于顺序的,因此P0P1是可行的而不是P1P0。此外,还有太多的排列。

我举一个不同组合的例子,以使其更清晰。假设我有一个新的权限字符串,如P0P1P2。在声明我之前在trie中的条目不适合构建字符串之前,我必须检查不同的组合。

请注意,排列不得在目标(新输入)字符串中包含的任何权限。

jersey-core-1.9.1.jar

我陷入困境,想知道是否有一些算法可以为非常大的字符串生成这样的组合,或者我不得不放弃这个想法并进入不同的路径。

2 个答案:

答案 0 :(得分:1)

您不想创建可以创建的所有可能权限字符串的结构,因为这会随着字符串数量呈指数级增长。

这里有一个替代方案:你可以将每个权限字符串表示为位向量(使用Python' s BitVector package,每当相应的权限P [i]设置向量的位i时包含在你的字符串中...这是有效的,因为权限的顺序并不重要。

假设您有一个名为bv的权限位向量列表,它已按长度排序。然后你可以按如下方式创建简化列表(我假设只有68个权限,0到67,给出你的例子):

for v in bv: coll = BitVector(intval=0, size=68) # Null vector for x in reduced: # Look at all accepted strings if (x & v == x): # If all permissions in x are needed coll = coll | x # Record this as a possible substring if (coll != v): # If some permission is missing reduced.append(v) # String can't be represented

例如,如果reduce包含P0,P1,P1P2和P2P3,并且下一个值是v = P0P1P2,那么coll = P0 | P1 | P1P2 = P0P1P2。由于这是我们正在测试的字符串,因此我们不会将其推送到缩减列表中。

如果后续值为v = P0P1P3,则coll = P0 | P1,不等于v,因此添加此值。之所以会出现这种情况,是因为您无法在不使用P2的情况下获得P3。

答案 1 :(得分:0)

我之前误解了你的问题陈述:我认为你必须以给定的顺序获得权限,这样P0P1以后就不足以覆盖P1P0的一行。抱歉延误。

我建议使用位向量的替代方法:集合。它们速度较慢,但​​可以自动扩展,也许更容易使用。每个输入行的过程是:

  • 将权限号提取到一个集合中。
  • 清除主集。
  • 对于每个先前的字符串,
  • 如果previous是当前字符串的子集(没有额外权限),
  • 将其添加到主集。
  • 检查主机组;如果它是输入行的超集,
  • 报告新字符串由先验者覆盖;
  • else将新字符串添加到priors列表中。

我还假设您不需要知道需要哪些先前的输入行来覆盖当前的输入行。

代码:

prior = []    # list of prior input sets (only those that weren't redundant)

with open("permission.txt") as in_file:
  for in_line in in_file:
    master = set([])
    perm_line = set([int(s) for s in in_line.split('P') if len(s)>0])

    # Check each of the shorter (prior) permission strings
    for short in prior:
      # If the prior string has no superfluous permissions, add it to the master set.
      extra = short - perm_line
      if len(extra) == 0:
        master = master.union(short)

    # Did the combination of prior strings cover all of the input line permissions?
    print in_line,
    if len(perm_line - master) == 0:
      print "\tPermissions included in prior strings"
    else:
      print "\tFound New permission combinations; adding to reference list"
      prior.append(perm_line)

输出:(输入文件回显以使该部分显而易见;我添加了几行)

P1
    Found New permission combinations; adding to reference list
P2
    Found New permission combinations; adding to reference list
P4
    Found New permission combinations; adding to reference list
P23
    Found New permission combinations; adding to reference list
P0P1
    Found New permission combinations; adding to reference list
P3P5
    Found New permission combinations; adding to reference list
P8P13P45P67
    Found New permission combinations; adding to reference list
P0P2
    Found New permission combinations; adding to reference list
P0P1P3P5
    Permissions included in prior strings
P0P1P2
    Permissions included in prior strings
P0P1P2P3
    Found New permission combinations; adding to reference list

更新:以下是跟踪用于创建新字符串的权限集的方法。专注于新对象封面 cover_team 。请注意,这不会找到 minimal 覆盖范围,但如果您在前面而不是结尾添加新元素 previous ,则会接近它。这使得常规搜索从最长到最短。

我采用“便宜”的方式进行报告,只打印权限集。我会让你担心格式化。

prior = [] # list of prior input sets (only those that weren't redundant)

with open("permission.txt") as in_file:
  for in_line in in_file:
    master = set([])
    cover = set([])
    cover_team = []
    perm_line = set([int(s) for s in in_line.split('P') if len(s)>0])

    # Check each of the shorter (prior) permission strings
    for short in prior:
      # If the prior string has no superfluous permissions, add it to the master set.
      extra = short - perm_line
      if len(extra) == 0:
        master = master.union(short)
        # Does this string add anything new to the coverage?
        ### print "compare", short, cover
        if len(short - cover) > 0:
          cover = cover.union(short)
          cover_team.append(short)
          ### print "Add to cover_team:", short

    # Did the combination of prior strings cover all of the input line permissions?
    print in_line,
    if len(perm_line - master) == 0:
      print "\tPermissions included in prior strings", cover_team
    else:
      print "\tFound New permission combinations; adding to reference list"
      prior.append(perm_line)

输出:

P1
    Found New permission combinations; adding to reference list
P2
    Found New permission combinations; adding to reference list
P4
    Found New permission combinations; adding to reference list
P23
    Found New permission combinations; adding to reference list
P0P1
    Found New permission combinations; adding to reference list
P3P5
    Found New permission combinations; adding to reference list
P8P13P45P67
    Found New permission combinations; adding to reference list
P0P2
    Found New permission combinations; adding to reference list
P0P1P3P5
    Permissions included in prior strings [set([1]), set([0, 1]), set([3, 5])]
P0P1P2
    Permissions included in prior strings [set([1]), set([2]), set([0, 1])]
P0P1P2P3
    Found New permission combinations; adding to reference list
P0P1P2P3P4P5
    Permissions included in prior strings [set([1]), set([2]), set([4]), set([0, 1]), set([3, 5])]