在单个散点图上为多个数据集添加多个趋势线(4)matplotlib

时间:2017-02-20 08:24:41

标签: python plot dataset data-visualization graphing

我从48个文件中的超过12,000行数据开始,在创建了4个模块和16个python类之后,程序到目前为止通过数据识别出每个文件中有4行的3个.csv文件(总共48行) )!我必须创建3个散点图。

我已经绘制了16行,分为4个数据集。我需要在散点图上为每个数据集添加趋势线。共有4条趋势线,每条趋势线有4个数据点。下面是我在这张图上需要四条趋势线的3个散点图之一的副本:第一,第二,第三和第四。下面是执行类的一个例子:

编辑和更正 enter image description here

import matplotlib.pyplot as plt
import numpy as np
# Created as part of the DineroIV Simulation reporting module
# author:GeoWade

class SplitFormatDc1():

    def openDc1Csv(self):
        lstdc1csv = []
        # open the file and read into the program
        filadc1csv = open('Dc1din-Dcaches-totalsXY.csv', 'r')
        for vcs1dc in filadc1csv:
            vcs1dcOne = vcs1dc.strip('\n')
            vcs1dcTwo = vcs1dcOne[-6:]
            lstdc1csv.append(vcs1dcTwo)
        filadc1csv.close()
        return lstdc1csv

    def split_Dc1Csv(self):
        # create an object to assign the function that was abstracted
        lstdc1split = self.openDc1Csv()
        # convert every data point into a float
        four8k = [float(x) for x in lstdc1split]
        # declare 4 lists for the soon to be data sets
        fourk = []
        eightk = []
        sixtnk = []
        thtwok = []

        lineLst = [32.00, 64.00, 128.00, 256.00]
        numb = 0

        # create 4 seperate data sets, 1 correlating to 
        # each cache size and line size
        for t in four8k:
            if (numb < 4):
                tOne = 1.00 - t
                fourk.append(tOne)
            elif (numb >= 4 and numb < 8):
                tTwo = 1.00 - t
                eightk.append(tTwo)
            elif (numb >= 8 and numb < 12):
                tThree = 1.00 - t
                sixtnk.append(tThree)
            else:
                tFour = 1.00 - t
                thtwok.append(tFour)
            numb += 1

        x = np.arange(0.8500, 1.000)
        y = np.arange(0.8500, 256.00)

        fig = plt.figure()
        ax1 = fig.add_subplot(111)

        ax1.scatter(lineLst,fourk, s=10, c='b', marker="s", label='4Kb Cache')
        # creat the polyfit
        z1 = np.polyfit(lineLst, fourk, 1)

        # create the poly1d
        p1 = np.poly1d(z1)

        # plot the line fit
        plt.plot(lineLst,p1(lineLst),"b--")

        # plot the scatter for the 8kb cache
        ax1.scatter(lineLst, eightk, s=10, c='r', marker="o", label='8Kb Cache')
        # creat the polyfit
        z2 = np.polyfit(lineLst, eightk, 1)

        # create the poly1d
        p2 = np.poly1d(z2)

        # plot the line fit
        plt.plot(lineLst,p2(lineLst),"r--")

        # plot the scatter for the 16kb cache
        ax1.scatter(lineLst, sixtnk, s=10, c='g', marker="s", label='16Kb Cache')
        # create the polyfit
        z3 = np.polyfit(lineLst, sixtnk, 1)
        # create the poly1d
        p3 = np.poly1d(z3)
        # plot the line fit
        plt.plot(lineLst,p3(lineLst),"g--")

        # plot the scatter for the 32kb cache
        ax1.scatter(lineLst, thtwok, s=10, c='y', marker="o", label='32Kb Cache')
        # create the polyfit
        z4 = np.polyfit(lineLst, thtwok, 1)
        # create the poly1d
        p4 = np.poly1d(z4)
        # plot the line fit
        plt.plot(lineLst,p4(lineLst),"y--")
        # add a legend
        plt.legend(loc='lower left');
        fig1 = plt.gcf()
        # save the image
        fig1.savefig('cc1din1.png', dpi=75)

这是从Dc1din-Dcaches-totalsXY.csv读入第一个函数的数据。第一列已经过时,我设计了这个类,因为16行被迭代分成4个列表;并且第一列仅标识第3列应该在哪个列表中。

4k,32,0.0740
4k,64,0.0816
4k,128,0.1078
4k,256,0.1391
8k,32,0.0454
8k,64,0.0496
8k,128,0.0615
8k,256,0.0795
16k,32,0.0252
16k,64,0.0249
16k,128,0.0276
16k,256,0.0369
32k,32,0.0138
32k,64,0.0115
32k,128,0.0118
32k,256,0.0154

0 个答案:

没有答案