我正在使用两个数据集,并且出于可重复性的原因,我正在共享数据集here。
为了使我清楚自己在做什么-从第2列开始,我正在读取当前行并将其与上一行的值进行比较。如果更大,我会继续比较。如果当前值小于上一行的值,我想将当前值(较小)除以上一个值(较大)。因此,以下代码:
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns
protocols = {}
types = {"data_v": "data_v.csv", "data_r": "data_r.csv"}
for protname, fname in types.items():
col_time,col_window = np.loadtxt(fname,delimiter=',').T
trailing_window = col_window[:-1] # "past" values at a given index
leading_window = col_window[1:] # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]
protocols[protname] = {
"col_time": col_time,
"col_window": col_window,
"quotient_times": quotient_times,
"quotient": quotient,
}
plt.figure(); plt.clf()
diff=quotient_times
plt.plot(diff,beta_value, ".", label=protname, color="blue")
plt.ylim(0, 1.0001)
plt.title(protname)
plt.xlabel("quotient_times")
plt.ylabel("quotient")
plt.legend()
plt.show()
这给出了以下图表。
quotient_times
小于3时, Data-V 的商为0.8,如果quotient_times
为
大于3。
Data-R 的quotient
常量为0.5,无论quotient_times
基于此要求,我们如何计算先前的quotient_times
和当前的quotient_times
的可能性,以区分Data-V
与Data-R
?它们唯一不同的地方是quotient_times
是<=3.01
,而两者的值都大于3的quotient_times
。为简化问题,基于Data-V可能是Data-R> 0.5 ?
答案 0 :(得分:1)
如果您只是要求Data-V> 0.5的概率,我会找到它的独立概率。正如我所评论的,我不认为数据之间存在相关性,因为Data-V会出现尖峰,而Data-R则完全没有反应。
import numpy as np
types = {"data_v": "data_v.csv"} # only considering data-V
for protname, fname in types.items(): # your code to load the data
col_time, col_window = np.loadtxt(fname, delimiter=',').T
trailing_window = col_window[:-1] # "past" values at a given index
leading_window = col_window[1:] # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]
# Now we'll go through the data for quotients and find how many meet the > 0.5 criteria
count = 0
for quot in quotient:
if quot > 0.5:
count += 1
probability = float(count) / len(quotient) # Calculate a float of occurrences / chances to occur
print(probability)
输出
0.0625
因此,当Data-V被认为独立于Data-V时,它有6.25%的可能性会高于0.5。
更新
如果您只想考虑quotient_times小于3:
import numpy as np
types = {"data_v": "data_v.csv"}
for protname, fname in types.items():
col_time, col_window = np.loadtxt(fname, delimiter=',').T
trailing_window = col_window[:-1] # "past" values at a given index
leading_window = col_window[1:] # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]
occurrence_count = 0
possibility_count = 0
for index in range(len(quotient)):
if quotient_times[index] < 3:
possibility_count += 1
if quotient[index] > 0.5:
occurrence_count += 1
probability = float(occurrence_count) / possibility_count
print(probability)
输出
1.0
因此quotient_times
小于3的数据集的100%的quotient
值也大于0.5。同样,这仅考虑了Data-V