所以我有一个看起来像这样的列表:
my_list = [0,1,1,1,0,0,1,0,1,0,1,1,0,0,0,1,0,1,1,0,1 ... 0,1,0]
它基本上包含数千个0和1。我正在寻找一种方法来查找其中的元素的相似(重复)组合(要确定10个下一个元素)。因此(例如)是否存在:
... 0,1,1,1,0,0,1,1,0,1 ...
组合,它不止一次出现,我想知道它在列表(索引)中的位置以及重复了多少次。
我需要在这里检查所有可能的组合,即1024种可能性...
答案 0 :(得分:2)
这是使用正则表达式的解决方案:
import random
from itertools import product
import re
testlist = [str(random.randint(0,1)) for i in range(1000)]
testlist_str = "".join(testlist)
for i in ["".join(seq) for seq in product("01", repeat=10)]:
print(f'pattern {i} has {len(re.findall(i, testlist_str))} matches')
输出:
pattern 0000000000 has 0 matches
pattern 0000000001 has 0 matches
pattern 0000000010 has 1 matches
pattern 0000000011 has 2 matches
pattern 0000000100 has 2 matches
pattern 0000000101 has 2 matches
....
答案 1 :(得分:2)
这似乎是一个作业问题,所以我不想立即给出解决方案,只是提示。
不要从字面上看。它是0和1,因此您可以像二进制数字一样查看它们。
一些提示:
想想那你该怎么做。
更多提示,更技术性:
%512
),向左“移动”它们(*2
)来获得第1个至第10个模式十位数。稍后我将编辑此答案以提供示例解决方案,但我必须先稍作休息。
编辑:
可定制的基数和长度以及您的情况的默认值。
def find_patterns(my_list, base=2, pattern_size=10):
modulo_value = base ** (pattern_size-1)
results = [[] for _ in range(base ** pattern_size)]
current_value = 0
for index, elem in enumerate(a):
if index < pattern_size:
current_value = base*current_value + elem
elif index == pattern_size:
results[current_value].append(0)
if index >= pattern_size:
current_value = base*(current_value % modulo_value) + elem
results[current_value].append(index+1-pattern_size) #index of the first element in the pattern
return results
答案 2 :(得分:1)
IIUC,您可以这样做:
my_list = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0]
w = 10
occurrences = {}
for i in range(len(my_list) - w + 1):
key = tuple(my_list[i:i+w])
occurrences.setdefault(key, []).append(i)
for pattern, indices in occurrences.items():
print(pattern, indices)
输出
(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) [0]
(1, 1, 1, 0, 0, 1, 0, 1, 0, 1) [1]
(1, 1, 0, 0, 1, 0, 1, 0, 1, 1) [2]
(1, 0, 0, 1, 0, 1, 0, 1, 1, 0) [3]
(0, 0, 1, 0, 1, 0, 1, 1, 0, 0) [4]
(0, 1, 0, 1, 0, 1, 1, 0, 0, 0) [5]
(1, 0, 1, 0, 1, 1, 0, 0, 0, 1) [6]
(0, 1, 0, 1, 1, 0, 0, 0, 1, 0) [7]
(1, 0, 1, 1, 0, 0, 0, 1, 0, 1) [8]
(0, 1, 1, 0, 0, 0, 1, 0, 1, 1) [9]
(1, 1, 0, 0, 0, 1, 0, 1, 1, 0) [10]
(1, 0, 0, 0, 1, 0, 1, 1, 0, 1) [11]
(0, 0, 0, 1, 0, 1, 1, 0, 1, 0) [12]
(0, 0, 1, 0, 1, 1, 0, 1, 0, 1) [13]
(0, 1, 0, 1, 1, 0, 1, 0, 1, 0) [14]
答案 3 :(得分:1)
将元素视为可以转换为整数的位。下面的解决方案将输入列表转换为整数,找到每个整数的出现次数以及可以在哪些索引上找到它们。
import collections
x = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1]
as_int = []
# given the input above there is no pattern longer than 6 that occure more than once...
pattern_length = 6
# convert input to a list of integers
# can this be done in a nicer way, like skipping the string-conversion?
for s in range(len(x) - pattern_length+1) :
bitstring = ''.join([str(b) for b in x[s:s+pattern_length]])
as_int.append(int(bitstring,2))
# create a dict with integer as key and occurence as value
count_dict = collections.Counter(as_int)
# empty dict to store index for each integer
index_dict = {}
# find index for each integer that occur more than once
for key in dict(count_dict):
if count_dict[key] > 1:
indexes = [i for i, x in enumerate(as_int) if x == key]
index_dict[key] = indexes
#print as binary together with its index
for key, value in index_dict.items():
print('{0:06b}'.format(key), 'appears', count_dict[key], 'times, on index:', value)
输出:
101011 appears 2 times, on index: [6, 18]
010110 appears 2 times, on index: [7, 14]