如何将B列转换为python中的转换矩阵?
矩阵的大小为19,这是B列中的唯一值。 数据集中共有432行。
time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
...
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088
矩阵应包含它们之间的过渡数。
B -----------------1088------1288----------------------------
B
.
.
1088 8 2
.
.
.
.
. Number of transitions between them.
..
.
.
答案 0 :(得分:1)
我使用您的数据仅使用列B
创建DataFrame,但它也应适用于所有列。
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
我在colum中获得了唯一的值,以后可以用它来创建矩阵
numbers = sorted(df['B'].unique())
print(numbers)
[225, 275, 750, 816, 834, 998, 1088, 1285, 1288]
我创建了移列C
,所以每一行都有两个值
df['C'] = df.shift(-1)
print(df)
B C
0 816 816.0
1 816 998.0
2 998 750.0
3 750 998.0
我按['B', 'C']
分组,这样我就可以计算对了
groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)
{(225, 1288.0): 2, (275, 225.0): 2, (750, 998.0): 2, (816, 275.0): 2, (816, 816.0): 2, (816, 998.0): 2, (834, 1285.0): 2, (998, 750.0): 2, (998, 816.0): 2, (998, 834.0): 2, (998, 998.0): 12, (1088, 1088.0): 14, (1088, 1285.0): 2, (1285, 998.0): 2, (1285, 1088.0): 2, (1285, 1285.0): 6, (1285, 1288.0): 2, (1288, 1088.0): 2, (1288, 1285.0): 2}
现在我可以创建矩阵了。使用numbers
和counts
创建列/系列(具有正确的index
),然后将其添加到矩阵中。
matrix = pd.DataFrame()
for x in numbers:
matrix[x] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
结果
225 275 750 816 834 998 1088 1285 1288
225 0 2 0 0 0 0 0 0 0
275 0 0 0 2 0 0 0 0 0
750 0 0 0 0 0 2 0 0 0
816 0 0 0 2 0 2 0 0 0
834 0 0 0 0 0 2 0 0 0
998 0 0 2 2 0 12 0 2 0
1088 0 0 0 0 0 0 14 2 2
1285 0 0 0 0 2 0 2 6 2
1288 2 0 0 0 0 0 0 2 0
完整示例
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
numbers = sorted(df['B'].unique())
print(numbers)
df['C'] = df.shift(-1)
print(df)
groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)
matrix = pd.DataFrame()
for x in numbers:
matrix[str(x)] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
编辑:
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
正常的for
循环
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
counts[pair] = len(group)
else:
counts[pair] = 0
大于10时取反值
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
count = len(group)
if count > 10 :
counts[pair] = -count
else
counts[pair] = count
else:
counts[pair] = 0
编辑:
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
#counts[(A,B)] = len((A,B)) + len((B,A))
if pair not in counts:
counts[pair] = len(group) # put first value
else:
counts[pair] += len(group) # add second value
#counts[(B,A)] = len((A,B)) + len((B,A))
if (pair[1],pair[0]) not in counts:
counts[(pair[1],pair[0])] = len(group) # put first value
else:
counts[(pair[1],pair[0])] += len(group) # add second value
else:
counts[pair] = 0 # (816,816) gives 0
#counts[(A,B)] == counts[(B,A)]
counts_2 = {}
for pair, count in counts.items():
if count > 10 :
counts_2[pair] = -count
else:
counts_2[pair] = count
matrix = pd.DataFrame()
for x in numbers:
matrix[str(x)] = pd.Series([counts_2.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
答案 1 :(得分:0)
另一种基于熊猫的方法。注意,我使用过shift(1),这意味着下一个数字是过渡:
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
# alternative approach
df['C'] = df['B'].shift(1) # shift forward so B transitions to C
df['counts'] = 1 # add an arbirtary counts column for group by
# group together the combinations then unstack to get matrix
trans_matrix = df.groupby(['B', 'C']).count().unstack()
# max the columns a bit neater
trans_matrix.columns = trans_matrix.columns.droplevel()
结果是:
我认为这是正确的,即您一次观察到225,然后转换为1288。您只需将其除以样本大小即可获得每个值的概率转换矩阵。