如何使用文本标题很好地打印numpy矩阵 - python

时间:2017-01-19 09:30:06

标签: python-2.7 numpy

我对python有一个问题:

如何使用这样的标题很好地打印矩阵:

      T  C  G  C  A
  [0 -2 -4 -6 -8 -10]
T [-2  1 -1 -3 -5 -7]
C [-4 -1  2  0 -2 -4]
C [-6 -3  0  1  1 -1]
A [-8 -5 -2 -1  0  2]

我使用numpy.matrix(mat)打印黑社会 但我得到的只是:

[[  0  -2  -4  -6  -8 -10]
 [ -2   1  -1  -3  -5  -7]
 [ -4  -1   2   0  -2  -4]
 [ -6  -3   0   1   1  -1]
 [ -8  -5  -2  -1   0   2]]

我也没有成功添加标题。

感谢!!!

更新

谢谢大家。 我成功安装了大熊猫'但我有两个新问题。 这是我的代码:

import pandas as pd
col1 = [' ', 'T', 'C', 'G', 'C', 'A']
col2 = [' ', 'T', 'C', 'C', 'A']
df = pd.DataFrame(mat,index = col2, columns = col1)
print df

但是我收到了这个错误:

    df = pd.DataFrame(mat,index = col2, columns = col1)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 163, in __init__
    copy=copy)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 224, in _init_ndarray
    return BlockManager([block], [columns, index])
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 237, in __init__
    self._verify_integrity()
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 313, in _verify_integrity
    union_items = _union_block_items(self.blocks)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 906, in _union_block_items
    raise Exception('item names overlap')
Exception: item names overlap

当我试图更改字母时,它会起作用:

       T   B   G   C   A  
   0   -2  -4  -6  -8  -10
T  -2  1   -1  -3  -5  -7 
C  -4  -1  2   0   -2  -4 
C  -6  -3  0   1   1   -1 
A  -8  -5  -2  -1  0   2  

但是你可以看到矩阵的布局不太好。 我该如何解决这些问题呢?

3 个答案:

答案 0 :(得分:3)

Numpy没有提供开箱即用的功能。

(a)pandas

你可以看看大熊猫。打印pandas.DataFrame通常看起来很不错。

import numpy as np
import pandas as pd
cols = ["T", "C", "S", "W", "Q"]
a = np.random.randint(0,11,size=(5,5))
df = pd.DataFrame(a, columns=cols, index=cols)
print df

将产生

   T  C   S  W  Q
T  9  5  10  0  0
C  3  8   0  7  2
S  0  2   6  5  8
W  4  4  10  1  5
Q  3  8   7  1  4

(b)纯python

如果您只有纯Python可用,则可以使用以下功能。

import numpy as np

def print_array(a, cols, rows):
    if (len(cols) != a.shape[1]) or (len(rows) != a.shape[0]):
        print "Shapes do not match"
        return
    s = a.__repr__()
    s = s.split("array(")[1]
    s = s.replace("      ", "")
    s = s.replace("[[", " [")
    s = s.replace("]])", "]")
    pos = [i for i, ltr in enumerate(s.splitlines()[0]) if ltr == ","]
    pos[-1] = pos[-1]-1
    empty = " " * len(s.splitlines()[0])
    s = s.replace("],", "]")
    s = s.replace(",", "")
    lines = []
    for i, l in enumerate(s.splitlines()):
        lines.append(rows[i] + l)
    s  ="\n".join(lines)
    empty = list(empty)
    for i, p in enumerate(pos):
        empty[p-i] = cols[i]
    s = "".join(empty) + "\n" + s
    print s



c = [" ", "T", "C", "G", "C", "A"]
r = [" ", "T", "C", "C", "A" ]
a = np.random.randint(-4,15,size=(5,6))    
print_array(a, c, r)

给你

       T  C  G  C  A      
  [ 2  5 -3  7  1  9]
T [-3 10  3 -4  8  3]
C [ 6 11 -2  2  5  1]
C [ 4  6 14 11 10  0]
A [11 -4 -3 -4 14 14]

答案 1 :(得分:0)

考虑一个示例数组 -

In [334]: arr = np.random.randint(0,25,(5,6))

In [335]: arr
Out[335]: 
array([[24,  8,  6, 10,  5, 11],
       [11,  5, 19,  6, 10,  5],
       [ 6,  2,  0, 12,  6, 17],
       [13, 20, 14, 10, 18,  9],
       [ 9,  4,  4, 24, 24,  8]])

我们可以使用pandas数据帧,如此 -

import pandas as pd

In [336]: print pd.DataFrame(arr,columns=list(' TCGCA'),index=list(' TCCA'))
        T   C   G   C   A
   24   8   6  10   5  11
T  11   5  19   6  10   5
C   6   2   0  12   6  17
C  13  20  14  10  18   9
A   9   4   4  24  24   8

请注意,pandas dataframe需要所有行和列的标题(列ID)和索引。因此,要跳过第一行和第一行的ID,我们使用了第一个为空的ID:' TCGCA'' TCCA'

答案 2 :(得分:0)

这是使用普通Python和numpy

添加标签的快速版本

定义一个写行的函数。这里只是打印行,但它可以设置为打印到文件,或收集列表中的所有行并返回。

def pp(arr,lbl):
    print('  ','  '.join(lbl))
    for i in range(4):
         print('%s %s'%(lbl[i], arr[i]))

In [65]: arr=np.arange(16).reshape(4,4)

二维数组的默认显示

In [66]: print(arr)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

In [67]: lbl=list('ABCD')

In [68]: pp(arr,lbl)
   A  B  C  D
A [0 1 2 3]
B [4 5 6 7]
C [ 8  9 10 11]
D [12 13 14 15]

间距已关闭,因为numpy分别格式化每一行,为每行应用不同的元素宽度。但这是一个开始。

使用随机样本看起来更好:

In [69]: arr = np.random.randint(0,25,(4,4))
In [70]: arr
Out[70]: 
array([[24, 12, 12,  6],
       [22, 16, 18,  6],
       [21, 16,  0, 23],
       [ 2,  2, 19,  6]])
In [71]: pp(arr,lbl)
   A  B  C  D
A [24 12 12  6]
B [22 16 18  6]
C [21 16  0 23]
D [ 2  2 19  6]
相关问题