我有以下数组:
a=[['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
我希望为此建立一个转移概率矩阵,这样我得到:
[[P_AA,P_AB,P_AC,P_AD],
[P_BA,P_BB,P_BC,P_BD],
[P_CA,P_CB,P_CC,P_CD],
[P_DA,P_DB,P_DC,P_DD]]
(上面是说明),其中P_AA
计算数组a中有多少["A","A"]
,依此类推除以P_AA+P_AB+P_AC+P_AD
。我已经开始使用计数器
from collections import Counter
Counter(tuple(x) for x in l)
将数组元素正确计数为:
Counter({('A', 'B'): 2,
('B', 'B'): 1,
('B', 'C'): 1,
('C', 'B'): 1,
('B', 'A'): 2,
('A', 'D'): 2,
('D', 'D'): 1,
('D', 'A'): 1})
所以矩阵应该是
[[0,2/5,0,2/5],[2/4,1/4,1/4,0],[0,1,0,0],[1/2,0,0,1/2]]
答案 0 :(得分:2)
基于熊猫的解决方案:
import pandas as pd
from collections import Counter
# Create a raw transition matrix
matrix = pd.Series(Counter(map(tuple, a))).unstack().fillna(0)
# Normalize the rows
matrix.divide(matrix.sum(axis=1),axis=0)
# A B C D
#A 0.0 0.50 0.00 0.5
#B 0.5 0.25 0.25 0.0
#C 0.0 1.00 0.00 0.0
#D 0.5 0.00 0.00 0.5
答案 1 :(得分:1)
如果元素数量很少,那么简单地循环遍历所有元素应该没问题:
import numpy as np
a = [['A', 'B'], ['B', 'B'], ['B', 'C'], ['C', 'B'], ['B', 'A'],
['A', 'D'], ['D', 'D'], ['D', 'A'] ['A', 'B'], ['B', 'A'], ['A', 'D']]
a = np.asarray(a)
elems = np.unique(a)
dim = len(elems)
P = np.zeros((dim, dim))
for j, x_in in enumerate(elems):
for k, x_out in enumerate(elems):
P[j,k] = (a == [x_in, x_out]).all(axis=1).sum()
if P[j,:].sum() > 0:
P[j,:] /= P[j,:].sum()
输出:
array([[0. , 0.5 , 0. , 0.5 ],
[0.5 , 0.25, 0.25, 0. ],
[0. , 1. , 0. , 0. ],
[0.5 , 0. , 0. , 0.5 ]])
但是您也可以将计数器与预分配的转换矩阵一起使用,将元素映射到索引,将计数分配为值,然后进行归一化(就像我所做的最后两个步骤)。
答案 2 :(得分:1)
from collections import Counter
a = [['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
counts = Counter(map(tuple, a))
letters = 'ABCD'
p = []
for letter in letters:
d = sum(v for k, v in counts.items() if k[0] == letter)
p.append([counts.get((letter, x), 0) / d for x in letters])
print(p)
输出:
[[0.0, 0.5, 0.0, 0.5],
[0.5, 0.25, 0.25, 0.0],
[0.0, 1.0, 0.0, 0.0],
[0.5, 0.0, 0.0, 0.5]]
答案 3 :(得分:0)
这是一个非常适合itertools
和Counter
的问题。看一下以下 1 :
l = [['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
from collections import Counter
from itertools import product, groupby
unique_elements = set(x for y in l for x in y) # -> {'B', 'C', 'A', 'D'}
appearances = Counter(tuple(x) for x in l)
# generating all possible combinations to get the probabilities
all_combinations = sorted(list(product(unique_elements, unique_elements)))
# calculating and arranging the probabilities
table = []
for i, g in groupby(all_combinations, key=lambda x: x[0]):
g = list(g)
local_sum = sum(appearances.get(y, 0) for y in g)
table.append([appearances.get(x, 0) / local_sum for x in g])
# [[0.0, 0.5, 0.0, 0.5], [0.5, 0.25, 0.25, 0.0], [0.0, 1.0, 0.0, 0.0], [0.5, 0.0, 0.0, 0.5]]
1 我假设您对问题的表达有误:“ ...其中P_AA计算数组中有多少个[” A“,” A“] a,依此类推,然后除以P_AA + P_AB + P_AC + P_AD ...”。。你的意思是除以其他东西,对吧?