检测Python序列中的所有某些子串

时间:2015-04-28 11:14:43

标签: python substring

下面的代码演示了Penney的游戏 - 一个头部和尾部序列出现在另一个之前的概率。特别是,我想知道while not all(i in sequence for i in [pattern1, pattern2]):的效率,以及更全面地在Python中进行最佳编码的效率。这是Python中的合理尝试,还是更有效的方式。我认为我对Python的了解越多,我就越相信总会有更好的方法!

import random

pattern1 = 'TTT'
pattern2 = 'HTT'

pattern1Wins = 0
pattern2Wins = 0

trials = 1000

for _ in range(trials):

    sequence = ''

    while not all(i in sequence for i in [pattern1, pattern2]):
        sequence += random.choice(['H','T'])

    if sequence.endswith(pattern1):
        pattern2Wins += 1
    else:
        pattern1Wins += 1

print pattern1, 'wins =', pattern1Wins
print pattern2, 'wins =', pattern2Wins

print str((max([pattern1Wins, pattern2Wins]) / float(trials) * 100)) + '%'

2 个答案:

答案 0 :(得分:1)

鉴于你只关心最后三个字符是两个子字符串之一,我会选择以下内容:

sequence = ''
while True:
    sequence += random.choice('HT')
    if sequence.endswith(pattern1):
        pattern2Wins += 1
    elif sequence.endswith(pattern2):
        pattern1Wins += 1
    else:
        continue
    break

endswithin更有效率,因为它不会检查整个字符串中的匹配项(在那里你已经知道它不会成为任何< / em>的)。

或者,您可以将pattern1pattern2分解为字典{pattern: wins}

patterns = {'TTT': 0, 'HTT': 0}

...

sequence = ''
while True:
    sequence += random.choice('HT')
    for pattern in patterns:
        if sequence.endswith(pattern):
            patterns[pattern] += 1
            break
    else:
        continue
    break

最后,与+的字符串连接并不是非常有效;字符串是不可变的,因此每次都会创建一个新字符串。相反,请考虑将结果放入列表中,并检查其中的最后三项:

sequence = []
while True:
    sequence.append(random.choice('HT'))
    if sequence[-3:] == ['H', 'T', 'T']:
        ...

答案 1 :(得分:1)

创建你的序列,最初选择三个调用,然后只添加最后两个和一个新选择循环,直到出现其中一个模式:

pattern1 = 'TTT'
pattern2 = 'HTT'
trials = 1000
d = {pattern1: 0, pattern2: 0}

for _ in range(trials):
    sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
    while sequence not in {pattern1, pattern2}:
        sequence = sequence[-2:] + random.choice("HT")
    d[sequence] += 1

print pattern1, 'wins =', d[pattern1]
print pattern2, 'wins =', d[pattern2]
print str((max([d[pattern1], d[pattern2]]) / float(trials) * 100)) + '%'

random.seed的一些时间:

In [19]: import random
In [20]: random.seed(0)
In [21]: %%timeit
   ....: pattern1 = 'TTT'
   ....: pattern2 = 'HTT'
   ....: trials = 1000
   ....: patterns = {'TTT': 0, 'HTT': 0}
   ....: for _ in range(trials):
   ....:     sequence = ''
   ....:     while True:
   ....:         sequence += random.choice('HT')
   ....:         for pattern in patterns:
   ....:             if sequence.endswith(pattern):
   ....:                 patterns[pattern] += 1
   ....:                 break
   ....:         else:
   ....:             continue
   ....:         break
   ....: 
100 loops, best of 3: 7.28 ms per loop

In [22]: %%timeit
   ....: pattern1 = 'TTT'
   ....: pattern2 = 'HTT'
   ....: trials = 1000
   ....: d = {pattern1: 0, pattern2: 0}
   ....: for _ in range(trials):
   ....:     sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
   ....:     while sequence not in {pattern1, pattern2}:
   ....:         sequence = sequence[-2:] + random.choice("HT")
   ....:     d[sequence] += 1
   ....: 
100 loops, best of 3: 4.95 ms per loop

In [23]: %%timeit
pattern1 = 'TTT'
pattern2 = 'HTT'
pattern1Wins = 0
pattern2Wins = 0
trials = 1000
for _ in range(trials):
    sequence = ''                              
    while True:                                       
        sequence += random.choice('HT')
        if sequence.endswith(pattern1):
            pattern2Wins += 1         
        elif sequence.endswith(pattern2):
            pattern1Wins += 1
        else:
            continue
        break
   ....: 
100 loops, best of 3: 6.65 ms per loop