Question

我正在研究威尔逊氏病模型。该数据集包含Mus Musculus Organisms的基因概况分析。数据有6个GSE5348表达计数样品，每个样品有2个基因型集，每个基因的表达水平均归一化，即“ GSM121554”，“ GSM121555”，“ GSM121556”，“ GSM121547”，“ GSM121550”和“ GSM121552”。目的是预测疾病发生概率最大的时间。我想为我的数据集构建一个动态贝叶斯网络。我无法决定如何采用参数和时间步骤。有人可以给我关于适合我的项目或任何建议的DBN结构的整体看法吗？

This is the original dataset 到目前为止，我所做的事情：

数据预处理

visualisation of gene clusters and sub clusters

计算每个基因样本与质心的相似度，并将数据集分为两半： The similarity of gene samples with the centroid(seed)

a）1类：相似性大于或等于95％的基因（患病的风险较小） b）2类：相似度低于95％（患病风险更高）的基因

class1=df[df['similarity_percent']<95] #similarity_percent is a column consisting of the similarity of the gene with the centroid. class2=df[df['similarity_percent']>=95]

使用相似性标准相似性标准将类别进一步分为无风险，中风险和适当风险。

#classifying under class1:
adequate_risk_class1=class1[(class1['similarity_percent']>=73) & (class1['similarity_percent']<81)]
moderate_risk_class1=class1[(class1['similarity_percent']>=81) & (class1['similarity_percent']<88)]
no_risk_class1=class1[(class1['similarity_percent']>=88) & (class1['similarity_percent']<95)]

很抱歉，如果不能，可以很好地解释，但这在我脑海中尚不清楚，因此我无法继续进行下去。但是我应该如何为此构建一个DBN

时间序列基因表达数据的动态贝叶斯网络的构建

0 个答案: