第一列是员工代码。第十列的第二列是员工的属性值。
我想计算每2列之间的相关系数并打印相关系数值。 然后根据第二列到第十列对员工进行分组。
例如,employeecd 3和employeecd 26是第一组,employeecd 9和 employeecd 36是第二组,employeecd 51是第三组。
怎么做?
csv数据:
employeecd PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
3 -0.019370327 -0.066859115 -0.015559383 -0.016847672 -0.00450929 -0.015184005 0.001838603 0.009000022 0.005215408
9 0.016084648 -0.04588007 0.044825961 0.177887125 -0.064608769 -0.000602429 0.233581738 0.083210659 0.063923094
26 0.126099564 0.25791386 0.089866638 0.092998633 -0.10598045 0.153763875 0.128325487 0.084271646 -0.022443242
36 0.005507832 0.592918906 0.005249758 0.038787543 -0.019534589 0.029642274 0.035333581 0.007186858 0.022406557
51 0.012471334 -0.32436518 0.015579629 0.026908357 -0.004372528 -0.016313703 0.033948063 -0.007658299 -0.007111237
已完成,代码为:
import pandas as pd
import csv
# rs = pd.DataFrame.from_csv(r'D:/Clustering_TOP.csv',encoding='utf-8')
rs = pd.read_csv(r'D:/Clustering_TOP.csv',encoding='utf-8')
with open('D:/Clustering_TOP.csv','r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
csv_title = rows[0]
csv_title = csv_title[1:]
len_csv_title = len(csv_title)
for i in range(len_csv_title):
for j in range(i+1,len_csv_title):
print(str(csv_title[i]) + "_" + str(csv_title[j]) + " = " + str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()