在Ubuntu上绘制大型示波器文件时遇到MemoryError问题

时间:2019-04-22 06:59:16

标签: python-3.x pandas matplotlib

我正在尝试读取大型示波器.trc文件并绘制它们。绘制一个文件是可行的,但是一旦将脚本放入循环中,尝试绘制所有文件(一个文件一个循环),我就会得到MemoryError

代码:

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import readTrc #external file, same location as script

foldername = 'trc_folder'
folder = os.listdir(foldername)
path = os.path.dirname(os.path.realpath(__file__))

for filenumber, i in enumerate(folder):
    trc = path + '/' + foldername + '/' + i

    print('reading trc file ' + str(filenumber))

    datX, datY, m = readTrc.readTrc(trc)
    srx, sry = pd.Series(datX * 1000), pd.Series(datY * 1000)
    df_oszi = pd.concat([srx, sry], axis = 1)
    df_oszi.set_index(0, inplace = True)    

    #ERROR APPEARS with xticks argument
    #removing xticks does not help, because then errorpath changes to
    #/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py
    df_oszi.plot(grid = 1,
                 color = 'blue',
                 linewidth = 0.5,
                 figsize = (9,5),
                 legend = False,
                 xticks = np.arange(df_oszi.index[0], df_oszi.index[-1], 1))

    print('plotting file ' + str(filenumber))
    plt.savefig('Plot_' + str(filenumber) + '.png', dpi = 300)

问题似乎出在外部模块readTrc上。我花了相当长的时间才弄清楚这一点,因为python在MatplotlibPandas而不是readTrc周围抛出了错误,这似乎是读取.trc文件的非正式脚本。我在网上寻找它的原因是我正在寻找一种方法来读取python中的.trc文件。如果您知道读取示波器文件的更好方法,请告诉我。

我将执行脚本所需的所有内容压缩到以下文件夹:folder

(它非常大582MB,因为每个.trc文件的大小约为200MB),您可以在脚本中找到一个脚本,一个包含.trc个文件的文件夹以及一个外部python文件(模块)readTrc,这是读取.trc文件所必需的。执行脚本应该绘制第一个文件,但是至少在我的Ubuntu机器上,绘制/构造第二个文件时会抛出MemoryError。令我困惑的是,我只能在 Ubuntu (18.04)上获得此MemoryError,而不是在 Windows 10 上获得。

我将非常感谢您的帮助,以便我可以继续进行我的项目。如果您需要其他信息,请告诉我。

编辑:

readTrc.py的单个下载

Script.py的单个下载

print(type(datX))返回:

<class 'numpy.ndarray'>

打印datX返回一个具有 5000万值的对象:

[-0.005 -0.005 -0.005 ...  0.005  0.005  0.005]

这些通过print()函数是有效的,并且是:

-0.004999999906663635
-0.004999999806663634
-0.004999999706663633
-0.004999999606663631
-0.00499999950666363

编辑2

要使用新版本的readTrc运行代码,请进行以下更改:

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import readTrc

foldername = 'trc_folder'
folder = os.listdir(foldername)
path = os.path.dirname(os.path.realpath(__file__))

for filenumber, i in enumerate(folder):
    trc = path + '/' + foldername + '/' + i

    print('reading trc file ' + str(filenumber))

    datX, datY, d = readTrc.Trc().open(trc)
    srx, sry = pd.Series(datX * 1000), pd.Series(datY * 1000)
    df_oszi = pd.concat([srx, sry], axis = 1)
    df_oszi.set_index(0, inplace = True)    

    df_oszi.plot(grid = 1,
                 color = 'blue',
                 linewidth = 0.5,
                 figsize = (9,5),
                 legend = False,
                 xticks = np.arange(df_oszi.index[0], df_oszi.index[-1], 1))

    print('plotting file ' + str(filenumber))
    plt.savefig('Plot_' + str(filenumber) + '.png', dpi = 300)

内存错误:

Traceback (most recent call last):
  File "/home/artur/Desktop/zip_original/Script.py", line 27, in <module>
    xticks = np.arange(df_oszi.index[0], df_oszi.index[-1], 1))
  File "/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py", line 2941, in __call__
    sort_columns=sort_columns, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py", line 1977, in plot_frame
    **kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py", line 1804, in _plot
    plot_obj.generate()
  File "/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py", line 260, in generate
    self._make_plot()
  File "/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py", line 985, in _make_plot
    **kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py", line 1001, in _plot
    lines = MPLPlot._plot(ax, x, y_values, style=style, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py", line 615, in _plot
    return ax.plot(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/__init__.py", line 1805, in inner
    return func(ax, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_axes.py", line 1604, in plot
    self.add_line(line)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_base.py", line 1891, in add_line
    self._update_line_limits(line)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_base.py", line 1913, in _update_line_limits
    path = line.get_path()
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/lines.py", line 945, in get_path
    self.recache()
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/lines.py", line 649, in recache
    self._xy = np.column_stack(np.broadcast_arrays(x, y)).astype(float)
MemoryError

修改3:

对数据集进行采样似乎会减少数据值。这些是带有sampling = 1, sampling = 10, sampling = 100

的同一数据集的示例
srx, sry = pd.Series(datX[::sampling] * 1000), pd.Series(datY[::sampling] * 1000)

enter image description here enter image description here enter image description here

其原因是超高频波(UHF)的脉冲周期极短。每个脉冲只能由几个数据值组成。如果您降低考虑的值的数量,则会导致大量数据丢失。尽管此解决方案可以使代码正常工作,但它也会大大减少数据值。

2 个答案:

答案 0 :(得分:2)

哦,哇,我看不见树木所用的木头。 您正在尝试绘制过多的 个数据点(即100000002,我认为以600dpi打印的纸长约4公里),可以通过采样来解决:

sampling=100
srx, sry = pd.Series(datX[::sampling] * 1000), pd.Series(datY[::sampling] * 1000)

或通过有选择地研究特定范围:

srx, sry = pd.Series(datX[0:50000] * 1000), pd.Series(datY[0:50000] * 1000)

或两者的组合。

答案 1 :(得分:0)

花了很多时间,但我设法控制了MemoryError。我不仅要在每个循环的末尾放置gc.collect(),而且还要将plt.close()放在循环末尾。只有这样,错误才会停止。对困惑感到抱歉。我从中学到了很多。