请将此问题移至Code Review -area。它更适合那里,因为我知道下面的代码是垃圾,我希望关键反馈完成重写。我正在重新发明轮子。
# Description: you are given a bitwise pattern and a string
# you need to find the number of times the pattern matches in the string.
# The pattern is determined by markov chain.
# For simplicity, suppose the ones and zeros as unbiased coin flipping
# that stops as it hits the pattern, below.
#
# Any one liner or simple pythonic solution?
import random
def matchIt(yourString, yourPattern):
"""find the number of times yourPattern occurs in yourString"""
count = 0
matchTimes = 0
# How can you simplify the for-if structures?
# THIS IS AN EXAMPLE HOW NOT TO DO IT, hence Code-Smell-label
# please, read clarifications in [Update]
for coin in yourString:
#return to base
if count == len(pattern):
matchTimes = matchTimes + 1
count = 0
#special case to return to 2, there could be more this type of conditions
#so this type of if-conditionals are screaming for a havoc
if count == 2 and pattern[count] == 1:
count = count - 1
#the work horse
#it could be simpler by breaking the intial string of lenght 'l'
#to blocks of pattern-length, the number of them is 'l - len(pattern)-1'
if coin == pattern[count]:
count=count+1
average = len(yourString)/matchTimes
return [average, matchTimes]
# Generates the list
myString =[]
for x in range(10000):
myString= myString + [int(random.random()*2)]
pattern = [1,0,0]
result = matchIt(myString, pattern)
print("The sample had "+str(result[1])+" matches and its size was "+str(len(myString))+".\n" +
"So it took "+str(result[0])+" steps in average.\n" +
"RESULT: "+str([a for a in "FAILURE" if result[0] != 8]))
# Sample Output
#
# The sample had 1656 matches and its size was 10000.
# So it took 6 steps in average.
# RESULT: ['F', 'A', 'I', 'L', 'U', 'R', 'E']
[更新]
我将在这里解释一下理论,或许,问题可以通过这种方式简化。上面的代码尝试使用下面的转换矩阵A
构造马尔可夫链。您可以想象为硬币翻转的模式100
对应于它。
>>> Q=numpy.matrix('0.5 0.5 0; 0 0.5 0.5; 0 0.5 0')
>>> I=numpy.identity(3)
>>> I
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> Q
matrix([[ 0.5, 0.5, 0. ],
[ 0. , 0.5, 0.5],
[ 0. , 0.5, 0. ]])
>>> A=numpy.matrix('0.5 0.5 0 0; 0 0.5 0.5 0; 0 0.5 0 0.5; 0 0 0 1')
>>> A
matrix([[ 0.5, 0.5, 0. , 0. ],
[ 0. , 0.5, 0.5, 0. ],
[ 0. , 0.5, 0. , 0.5],
[ 0. , 0. , 0. , 1. ]])
问题中的average
8
成为矩阵N=(I-Q)^-1
中第一行的值的总和,其中Q
位于上方。
>>> (I-Q)**-1
matrix([[ 2., 4., 2.],
[ 0., 4., 2.],
[ 0., 2., 2.]])
>>> numpy.sum(((I-Q)**-1)[0])
8.0
现在,你可能会发现这个明显只有模式匹配的问题变成了马尔可夫链。我无法理解为什么你不能用类似于矩阵或矩阵的东西替换凌乱的for-while-if条件。我不知道如何实现它们,但迭代器可能是一种方法,研究,特别是在需要分解的更多状态。
但是Numpy出现了一个问题,-Inf
和NaN
的内容是什么?从(I-Q)**-1
矩阵检查上面应该收敛的值。 N
来自N=I+Q+Q^2+Q^3+...=\frac{I-Q^{n}}{I-Q}
。
>>> (I-Q**99)/(I-Q)
matrix([[ 2.00000000e+00, 1.80853571e-09, -Inf],
[ NaN, 2.00000000e+00, 6.90799171e-10],
[ NaN, 6.90799171e-10, 1.00000000e+00]])
>>> (I-Q**10)/(I-Q)
matrix([[ 1.99804688, 0.27929688, -Inf],
[ NaN, 1.82617188, 0.10742188],
[ NaN, 0.10742188, 0.96679688]])
答案 0 :(得分:2)
def matchIt(yourString, yourPattern):
"""find the number of times yourPattern occurs in yourString"""
您是否可以使用以下内容?
yourString.count(yourPattern)
在您的情况下,您可以将myString
创建为10000字符的真实字符串,将pattern
也创建为字符串,然后以简单的pythonic方式计算出现的次数。
修改强>
在pattern
(可以是字符串或列表)中为您提供text
的(重叠)出现次数的单行,可能如下所示:
nbOccurences = sum(1 for i in xrange(len(text)-len(pattern)) if text[i:i+len(pattern)] == pattern)
答案 1 :(得分:1)
好的 - 标准(-ish)字符串搜索:
def matchIt(needle, haystack):
"""
@param needle: string, text to seek
@param haystack: string, text to search in
Return number of times needle is found in haystack,
allowing overlapping instances.
Example: matchIt('abab','ababababab') -> 4
"""
lastSeenAt = -1
timesSeen = 0
while True:
nextSeen = haystack.find(needle, lastSeenAt+1)
if nextSeen==-1:
return timesSeen
else:
lastSeenAt = nextSeen
timesSeen += 1
但你想这样做到一个数字列表?没问题;我们只需要使用find()方法创建一个列表类,如下所示:
import itertools
class FindableList(list):
def find(self, sub, start=None, end=None):
"""
@param sub: list, pattern to look for in self
@param start: int, first possible start-of-list
If not specified, start at first item
@param: end: int, last+1 possible start-of-list
If not specified, end such that entire self is searched
Returns;
Starting offset if a match is found, else -1
"""
if start is None or start < 0:
start = 0
# N.B. If end is allowed to be too high,
# zip() will silently truncate the list comparison
# and you will probably get extra spurious matches.
lastEnd = len(self) - len(sub) + 1
if end is None or end > lastEnd:
end = lastEnd
rng = xrange if xrange else range
iz = itertools.izip
isl = itertools.islice
for pos in rng(start, end):
if all(a==b for a,b in iz(sub, isl(self, pos, end))):
return pos
# no match found
return -1
然后该示例看起来像
matchIt([1,2,1,2], FindableList([1,2,1,2,1,2,1,2,1,2])) -> 4
,您的代码变为:
# Generate a list
randIn = lambda x: int(x*random.random())
myString =[randIn(2) for i in range(10000)]
pattern = [1,0,0]
result = matchIt(pattern, myString)
print("The sample had {0} matches and its size was {1}.\n".format(result, len(myString)))
答案 2 :(得分:0)
尚未就绪。
类似的问题,但主要关注图形库here和类似的问题,但在C#,可能有用。
与此问题相关的文件是./networkx/generators/degree_seq.py
(997行,关于生成具有给定度数序列的graps)和./networkx/algorithms/mixing.py (line 20, function degree_assortativity(G) about probability based graphs)
并且还注意其源代码引用92个引用,不确定是否你想重新发明轮子。对于igraph,请阅读文件convert.c
关于加权边的第835行。您可以获取Networkx here的来源和igraph here的来源。请注意,前者是BSD许可证,用Python完成,而igraph在GNU(GPL)下完成,用C语言完成。
要开始使用Networkx,有关从jUnits test_convert_scipy.py
文件创建加权图的有用信息:
def create_weighted(self, G):
g = cycle_graph(4)
e = g.edges()
source = [u for u,v in e]
dest = [v for u,v in e]
weight = [s+10 for s in source]
ex = zip(source, dest, weight)
G.add_weighted_edges_from(ex)
return G
所以要创建你的马尔可夫链,请提供有关定向加权图here的帮助,或许这样:
>>> DG=nx.DiGraph()
>>> DG.add_weighted_edges_from([(0,0,0.5),(1,1,0.5),(3,3,1),(0,1,0.5),(1,2,0.5),(2,3,0.5), (2,1,0.5)])
或者可能存在一些现成的马尔可夫链生成工具,因为有一些其他随机过程,更多here。无法找到algorithm来分析具有例外值的图表,或者在示例中使用不同的集合进行试验,也许没有,并且您必须坚持使用其他回复者的解决方案。