Question

我正在使用两个数据集，并且出于可重复性的原因，我正在共享数据集here。

为了使我清楚自己在做什么-从第2列开始，我正在读取当前行并将其与上一行的值进行比较。如果更大，我会继续比较。如果当前值小于上一行的值，我想将当前值（较小）除以上一个值（较大）。因此，以下代码：

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns

protocols = {}

types = {"data_v": "data_v.csv", "data_r": "data_r.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }
    plt.figure(); plt.clf()
    diff=quotient_times
    plt.plot(diff,beta_value, ".", label=protname, color="blue")
    plt.ylim(0, 1.0001)
    plt.title(protname)
    plt.xlabel("quotient_times")
    plt.ylabel("quotient")
    plt.legend()
    plt.show()

这给出了以下图表。

quotient_times

Data-V 的商为0.8，如果quotient_times为大于3。
Data-R 的quotient常量为0.5，无论quotient_times

基于此要求，我们如何计算先前的quotient_times和当前的quotient_times的可能性，以区分Data-V与Data-R？它们唯一不同的地方是quotient_times是<=3.01，而两者的值都大于3的quotient_times。为简化问题，基于Data-V可能是Data-R> 0.5 ？

Answer 1

如果您只是要求Data-V> 0.5的概率，我会找到它的独立概率。正如我所评论的，我不认为数据之间存在相关性，因为Data-V会出现尖峰，而Data-R则完全没有反应。

import numpy as np

types = {"data_v": "data_v.csv"} # only considering data-V

for protname, fname in types.items(): # your code to load the data
    col_time, col_window = np.loadtxt(fname, delimiter=',').T
    trailing_window = col_window[:-1]  # "past" values at a given index
    leading_window = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    # Now we'll go through the data for quotients and find how many meet the > 0.5 criteria
    count = 0
    for quot in quotient:
        if quot > 0.5:
            count += 1

    probability = float(count) / len(quotient) # Calculate a float of occurrences / chances to occur
    print(probability)

输出

0.0625

因此，当Data-V被认为独立于Data-V时，它有6.25％的可能性会高于0.5。

更新

如果您只想考虑quotient_times小于3：

import numpy as np

types = {"data_v": "data_v.csv"}

for protname, fname in types.items():
    col_time, col_window = np.loadtxt(fname, delimiter=',').T
    trailing_window = col_window[:-1]  # "past" values at a given index
    leading_window = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    occurrence_count = 0
    possibility_count = 0
    for index in range(len(quotient)):
        if quotient_times[index] < 3:
            possibility_count += 1
            if quotient[index] > 0.5:
                occurrence_count += 1

    probability = float(occurrence_count) / possibility_count
    print(probability)

输出

1.0

因此quotient_times小于3的数据集的100％的quotient值也大于0.5。同样，这仅考虑了Data-V

我们如何计算python中的可能性？

1 个答案: