Question

enter image description here

我有一个序列数据集（长字符串的单字母代码，例如“ACDEF ....”，我已经计算了几千个序列中每个字母的平均出现次数。我想绘制使用以下代码表示每个字母的平均百分比组成：

import numpy as np
import matplotlib.pyplot as plt

codes = {'CYS': 'C', 'ASP': 'D', 'SER': 'S', 'GLN': 'Q', 'LYS': 'K',
       'ILE': 'I', 'PRO': 'P', 'THR': 'T', 'PHE': 'F', 'ASN': 'N',
       'GLY': 'G', 'HIS': 'H', 'LEU': 'L', 'ARG': 'R', 'TRP': 'W',
       'ALA': 'A', 'VAL':'V', 'GLU': 'E', 'TYR': 'Y', 'MET': 'M'}

res=[] 
freq=[]
for i in codes.values():
    res.append(i)
    # fraction_composition is the function that calculates percentage occurrence
    freq.append(fraction_composition(i))

res=np.array(res)
freq=np.array(freq)
freq*=100 

p1=plt.plot(res,freq,'r^--')

codes.values()指的是每个字母代码实际上是存储在字典中的键值对的一部分，我只是迭代地调用一个函数，以便计算每个字母的平均频率

我收到错误消息：

ValueError: could not convert string to float: C

调用绘图函数后。由于matplotlib似乎无法绘制字符数组，有什么办法可以解决这个问题吗？ x轴应该是我的单字母代码列表（代码 dict中的值），y轴应该是它们的平均百分比。

如果我尝试

plt.plot(freq)

我得到了我需要的图（但x轴显然是用数字标记的）。我希望x轴包含字符C，D，S，......

Answer 1

为了运行您的代码，我已将fraction_composition定义如下：

def fraction_composition(i):
    return np.random.rand()

当我使用此添加运行代码时，出现以下错误：

In [5]: run test.py
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/daniel/Downloads/test.py in <module>()
     23 freq*=100
     24 
---> 25 p1=plt.plot(res,freq,'r^--')

/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.pyc in plot(*args, **kwargs)
   3097         ax.hold(hold)
   3098     try:
-> 3099         ret = ax.plot(*args, **kwargs)
   3100         draw_if_interactive()
   3101     finally:

/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_axes.pyc in plot(self, *args, **kwargs)
   1372 
   1373         for line in self._get_lines(*args, **kwargs):
-> 1374             self.add_line(line)
   1375             lines.append(line)
   1376 

/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_base.pyc in add_line(self, line)
   1502             line.set_clip_path(self.patch)
   1503 
-> 1504         self._update_line_limits(line)
   1505         if not line.get_label():
   1506             line.set_label('_line%d' % len(self.lines))

/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_base.pyc in _update_line_limits(self, line)
   1513         Figures out the data limit of the given line, updating self.dataLim.
   1514         """
-> 1515         path = line.get_path()
   1516         if path.vertices.size == 0:
   1517             return

/usr/local/lib/python2.7/dist-packages/matplotlib/lines.pyc in get_path(self)
    872         """
    873         if self._invalidy or self._invalidx:
--> 874             self.recache()
    875         return self._path
    876 

/usr/local/lib/python2.7/dist-packages/matplotlib/lines.pyc in recache(self, always)
    573                 x = ma.asarray(xconv, np.float_)
    574             else:
--> 575                 x = np.asarray(xconv, np.float_)
    576             x = x.ravel()
    577         else:

/usr/lib/python2.7/dist-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    458 
    459     """
--> 460     return array(a, dtype, copy=False, order=order)
    461 
    462 def asanyarray(a, dtype=None, order=None):

ValueError: could not convert string to float: C

如您所见，调用plot的行会导致错误。

我试图绘制什么？

In [6]: res
Out[6]: 
array(['C', 'D', 'S', 'Q', 'K', 'P', 'T', 'F', 'A', 'H', 'G', 'I', 'E',
       'L', 'R', 'W', 'V', 'N', 'Y', 'M'], 
      dtype='|S1')

啊哈，res是一个字符串数组。 plot不支持字符串数据。

那你能做什么？

一种选择是将res中的字母转换为整数，并将字母指定为x-tick标签：

import numpy as np
import matplotlib.pyplot as plt

def fraction_composition(i):
    return np.random.rand()

codes = {'CYS': 'C', 'ASP': 'D', 'SER': 'S', 'GLN': 'Q', 'LYS': 'K',
       'ILE': 'I', 'PRO': 'P', 'THR': 'T', 'PHE': 'F', 'ASN': 'N',
       'GLY': 'G', 'HIS': 'H', 'LEU': 'L', 'ARG': 'R', 'TRP': 'W',
       'ALA': 'A', 'VAL':'V', 'GLU': 'E', 'TYR': 'Y', 'MET': 'M'}

def letter_to_number(i):
    poss_letters = sorted(codes.values())
    return poss_letters.index(i)

res=[]
res_labels = []
freq=[]
for i in codes.values():
    res_labels.append(i)
    res.append(letter_to_number(i))
    # fraction_composition is the function that calculates percentage occurrence
    freq.append(fraction_composition(i))

res=np.array(res)
freq=np.array(freq)
sort_i = [i[0] for i in sorted(enumerate(res_labels), key=lambda x:x[1])]
res_labels = sorted(res_labels)
res = res[sort_i]
freq = freq[sort_i]

freq*=100 

p1=plt.plot(res,freq,'r^--')
plt.xticks(range(len(res_labels)), res_labels)
plt.show()

绘制字符串与浮点（字母代码的出现与百分比构成）

1 个答案: