我正在尝试从一行文本中提取字符串及其字符的所有排列。
例如,我需要从以下字符串s中提取字符串t = 'ABC'
及其所有排列:'ABC','CAB','BCA','BAC','CBA':
s = 'ABCXABCXXACXXBACXXBCA'
结果为:ABC
,ABC
,BAC
,BCA
字符串t
应该是任意长度,并且可以包含[A-Z]
,[a-z]
和[0-9]
有没有办法通过在Python中使用正则表达式来获得结果?
我知道我可以构建所有排列的列表,然后单独搜索列表中的所有项目,但我想知道正则表达式是否能以更紧凑和更快的方式提供结果。
答案 0 :(得分:1)
让我勾画一个算法来解决问题。用正则表达式来解决这个问题不是一个问题。
此解决方案维护一个滑动窗口,并使用t
检查窗口中字符的频率。
以下是算法的伪代码:
function searchPermutation(inpStr, t):
// You may want to check t against the regex ^[A-Za-z0-9]+$ here
// Do a frequency counting of character in t
// For example, t = 'aABBCCC'
// Then freq = { 'A': 1, 'B': 2, 'C': 3, 'a': 1 }
freq = frequency(t)
// Create an empty dict
window = {}
// Number of characters in window
count = 0
// List of matches
result = []
for (i = 0; i < inpStr.length; i++):
// If the current character is a character in t
if inpStr[i] in freq:
// Add the character at current position
window[inpStr[i]]++
// If number of character in window is equal to length of t
if count == t.length:
// Remove the character at the end of the window
window[inpStr[i - t.length]]--
// The count is kept the same here
else: // Otherwise, increase the count
count++
// If all frequencies in window is the same as freq
if count == t.length and window == freq:
// Add to the result a match at (i - t.length + 1, i + 1)
// We can retrieve the string later with substring
result.append((i - t.length + 1, i + 1))
// Reset the window and count (prevent overlapping match)
// Remove the 2 line below if you want to include overlapping match
window = {}
count = 0
else: // If current character not in t
// Reset the window and count
window = {}
count = 0
return result
这可以解决任何t
的一般问题。
答案 1 :(得分:0)
正则表达式解决方案:
([ABC])(?!\1)([ABC])(?!\1)(?!\2)[ABC]