numpy:将当前行除以上一行

时间:2019-02-18 21:55:31

标签: python python-3.x csv numpy dataframe

对于我的实验,我具有以下三种格式的具有不同特征的不同时间序列数据,其中第一列是时间戳记,第二列是值。

0.086206438,10
0.086425551,12
0.089227066,20
0.089262508,24
0.089744425,30
0.090036815,40
0.090054172,28
0.090377569,28
0.090514071,28
0.090762872,28
0.090912691,27

为了重现性,我共享了here所使用的三个时间序列数据。

从第2列开始,我想读取当前行并将其与上一行的值进行比较。如果更大,我会继续比较。如果当前值小于上一行的值,我想将当前值(较小)除以上一个值(较大)。让我说清楚。例如,在上面提供的示例记录I中,第七行(28)小于第六行(40)中的值-因此它将是(28/40 = 0.7)。

这是我的示例代码。

import numpy as np
import pandas as pd
import csv
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
from statsmodels.graphics.tsaplots import plot_acf, acf


protocols = {}


types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

for protname, fname in types.items():
    col_time = []  
    col_window = [] 
    with open(fname, mode='r', encoding='utf-8-sig') as f:
        reader = csv.reader(f, delimiter=",")
        for i in reader:
            col_time.append(float(i[0]))
            col_window.append(int(i[1]))
    col_time, col_window = np.array(col_time), np.array(col_window)
    diff_time = np.diff(col_time)
    diff_window = np.diff(col_window)
    diff_time = diff_time[diff_window > 0] 
    diff_window = diff_window[diff_window > 0] # To keep only the increased values
    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "diff_time": diff_time,
        "diff_window": diff_window,
    }


# Plot the quotient values
rt = np.exp(np.diff(np.log(col_window)))

for protname, fname in types.items():
    col_time, col_window = protocols[protname]["col_time"], protocols[protname]["col_window"]
    rt = np.exp(np.diff(np.log(col_window)))
    plt.plot(np.diff(col_time), rt, ".", markersize=4, label=protname, alpha=0.1)
    plt.ylim(0, 1.0001)
    plt.xlim(0, 0.003)
    plt.title(protname)
    plt.xlabel("time")
    plt.ylabel("difference")
    plt.legend()
    plt.show()

这给了我以下情节

enter image description here

enter image description here enter image description here

但是,当我这样做

rt = np.exp(np.diff(np.log(col_window)))

它将当前每一行除以上一行,这不是我想要的。正如我在上面的问题示例中所解释的那样,仅当当前行值小于先前值时,才想将列2的当前行值除以列2的先前值。最后,绘制商对时间戳差异(上面我的代码中的col_time)。我怎样才能解决这个问题?

1 个答案:

答案 0 :(得分:2)

除非特别需要csv模块,否则我建议使用numpy method loadtxt来加载文件,即

col_time,col_window = np.loadtxt(fname,delimiter=',').T

此行处理for循环的前8行。请注意,必须进行转置操作(.T才能将原始数据形状(N行乘2列转换为2行乘N列形状已解压缩到col_timecol_window中。还要注意,loadtxt自动将数据加载到numpy.array对象中。

关于您的实际问题,我将使用切片和遮罩:

trailing_window = col_window[:-1] # "past" values at a given index
leading_window  = col_window[1:]  # "current values at a given index
decreasing_mask = leading_window < trailing_window
quotient = leading_window[decreasing_mask] / trailing_window[decreasing_mask]
quotient_times = col_time[decreasing_mask]

然后可以将quotient_timesquotient作图。

一种替代方法是使用numpy method where来获取掩码为True的索引:

trailing_window = col_window[:-1] # "past" values at a given index
leading_window  = col_window[1:]  # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]

请记住,以上所有代码仍然在第一个for循环中发生,但是现在rt在循环中的计算方式为quotient。因此,在计算quotient_times之后,进行绘制(也在第一个循环内):

# Next line opens a new figure window and then clears it
figure(); clf()
# Updated plotting call with the syntax from the answer
plt.plot(quotient_times,quotient,'.',ms=4,label=protname,alpha=0.1)
plt.ylim(0, 1.0001)
plt.xlim(0, 0.003)
plt.title(protname)
plt.xlabel("time")
plt.ylabel("quotient")
plt.legend()
# You may not need this `plt.show()` line 
plt.show()
# To save the figure, one option would be the following:
# plt.savefig(protname+'.png')    

请注意,您可能需要将plt.show()行从循环中删除。

为您拼凑起来,

import numpy as np
import matplotlib.pyplot as plt

protocols = {}

types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds] / 
    trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]
    # Still save the values in case computation needs to happen later 
    # in the script    
    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }
    # Next line opens a new figure window and then clears it
    plt.figure(); plt.clf()
    plt.plot(quotient_times,quotient, ".", markersize=4, label=protname, alpha=0.1)
    plt.ylim(0, 1.0001)
    plt.xlim(0, 0.003)
    plt.title(protname)
    plt.xlabel("time")
    plt.ylabel("quotient")
    plt.legend()
    # To save the figure, one option would be the following:
    # plt.savefig(protname+'.png')
# This may still be unnecessary, especially if called as a script
# (just save the plots to `png`).
plt.show()