我必须生成一个正常的分数转换。 可以使用quantile_transformer进行此操作。但是我在此项目中使用的是STAN(Pystan),因此必须导出查找表。 我找不到从quantile_transformer提取查找表的任何方法。
有人知道我如何提取它或使用其他软件包生成它吗?
现在,我正在使用自己的基本实现。如果我的意思不清楚,可以在下面看到。
任何输入将不胜感激!
自己的实现,请注意第三幅图的两个异常值:
# Import
import numpy as np
import matplotlib.pyplot as plt
# Nested fucntion
def histogram(data, title):
plt.figure()
plt.hist(data, bins=50, edgecolor='black')
plt.grid()
plt.title(title)
# Synthetic data distribution
data = np.random.laplace(0,1,(1000,1))
# Normal distributed data used for make table
data_n = np.random.normal(0,1,(1000000,1))
# Determine quantiles of both data sets
quan = np.linspace(0,100, 10000)
val = np.percentile(data, q=quan)
val_n = np.percentile(data_n, q=quan)
# Plot distributions
histogram(data, 'Synthetic data')
histogram(data_n, 'Normal distribution used for look-up table')
# Transform from data to normal distribution
synth2norm = np.zeros((len(data)))
for i in range(0, len(data)):
idx = np.argmin(abs(val-data[i]))
synth2norm[i] = val_n[idx]
histogram(synth2norm, 'Tranform synthetic to normal')
# Transform sampled normal to synthetic data distribution
sample = np.random.normal(0,1,(500,1))
sample2synth = np.zeros((len(sample)))
for i in range(0, len(sample)):
idx = np.argmin(abs(val_n-sample[i]))
sample2synth[i] = val[idx]
histogram(sample2synth, 'Sample from STAN (normal) to synthetic data distribution')