我们如何计算python中的可能性?

时间:2019-03-08 18:52:16

标签: python python-3.x scikit-learn scipy statistics

我正在使用两个数据集,并且出于可重复性的原因,我正在共享数据集here

为了使我清楚自己在做什么-从第2列开始,我正在读取当前行并将其与上一行的值进行比较。如果更大,我会继续比较。如果当前值小于上一行的值,我想将当前值(较小)除以上一个值(较大)。因此,以下代码:

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns

protocols = {}

types = {"data_v": "data_v.csv", "data_r": "data_r.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }
    plt.figure(); plt.clf()
    diff=quotient_times
    plt.plot(diff,beta_value, ".", label=protname, color="blue")
    plt.ylim(0, 1.0001)
    plt.title(protname)
    plt.xlabel("quotient_times")
    plt.ylabel("quotient")
    plt.legend()
    plt.show()

这给出了以下图表。

    quotient_times小于3时,
  • Data-V 的商为0.8,如果quotient_times为 大于3。

  • Data-R quotient常量为0.5,无论quotient_times

  • 的值如何

基于此要求,我们如何计算先前的quotient_times和当前的quotient_times的可能性,以区分Data-VData-R?它们唯一不同的地方是quotient_times<=3.01,而两者的值都大于3的quotient_times。为简化问题,基于Data-V可能是Data-R> 0.5

1 个答案:

答案 0 :(得分:1)

如果您只是要求Data-V> 0.5的概率,我会找到它的独立概率。正如我所评论的,我不认为数据之间存在相关性,因为Data-V会出现尖峰,而Data-R则完全没有反应。

import numpy as np

types = {"data_v": "data_v.csv"} # only considering data-V

for protname, fname in types.items(): # your code to load the data
    col_time, col_window = np.loadtxt(fname, delimiter=',').T
    trailing_window = col_window[:-1]  # "past" values at a given index
    leading_window = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    # Now we'll go through the data for quotients and find how many meet the > 0.5 criteria
    count = 0
    for quot in quotient:
        if quot > 0.5:
            count += 1

    probability = float(count) / len(quotient) # Calculate a float of occurrences / chances to occur
    print(probability)

输出

  

0.0625

因此,当Data-V被认为独立于Data-V时,它有6.25%的可能性会高于0.5。

更新

如果您只想考虑quotient_times小于3:

import numpy as np

types = {"data_v": "data_v.csv"}

for protname, fname in types.items():
    col_time, col_window = np.loadtxt(fname, delimiter=',').T
    trailing_window = col_window[:-1]  # "past" values at a given index
    leading_window = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    occurrence_count = 0
    possibility_count = 0
    for index in range(len(quotient)):
        if quotient_times[index] < 3:
            possibility_count += 1
            if quotient[index] > 0.5:
                occurrence_count += 1

    probability = float(occurrence_count) / possibility_count
    print(probability)

输出

  

1.0

因此quotient_times小于3的数据集的100%的quotient值也大于0.5。同样,这仅考虑了Data-V