我正在红外光谱(中红外区域)中表征生物样品,并将所得数据用于预测模型以进行疾病预测。我现在正在使用监督学习来处理光谱数据,第一步是准备数据(平滑,峰发现,峰过滤等),现在我有一个93x1的矩阵(因变量或疾病/非疾病)标签),其中93是样本数,矩阵是93x210,其中210是可在其中找到预先吸收的吸收峰的波长的数目。从这210个波长中,我需要提取要输入到模型中的特征(吸收峰)。为此,我在python中使用Pearson相关矩阵,其中标头为210 xi wavelenghts。我想找到“ xi wavelenght”处的吸收峰与样品之间的相关性。问题在于结果矩阵在任何地方都给我“ 1”
免责声明:我是python的新手
import numpy as np
import pandas as pd
from google.colab import files
uploaded = files.upload()
df2 = pd.read_excel(io.BytesIO(uploaded['20191201-Peaks.xlsx']),header=None, Index=None)
df2.columns=['Label_0_1','Sample',1700.105,...,1500.49]
print (df2.dtypes)
Label_0_1 object
Sample object
1700.105 float64
1699.141 float64
1698.177 float64
...
1504.35 float64
1503.38 float64
1502.42 float64
1501.45 float64
1500.49 float64
Length: 210, dtype: object
df2.shape
(93, 210)
import chart_studio.plotly as py
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(150,150))
cor = df2.corr(method='pearson')
cor
[![correlation matrix][1]][1]