我仍在学习Python,并且对我的代码的效率和可读性有一个快速的疑问。
目前我有这个
import pandas as pd
from sklearn.linear_model import Lasso
import numpy as np
df=pd.read_csv('Data\\cmd.csv')
df=df[['A71: ','A120: ',\
'A70: ','A84: ','A81: ','A89: ',\
'A101: ','A102: ','A105: ','CR']]
X=np.array(df[['A71: ','A120: ',\
'A70: ','A84: ','A81: ','A89: ',\
'A101: ','A102: ','A105: ']])
y=np.array(df[['CR']])
clf=Lasso()
clf.fit(X,y)
print('A71: ', clf.coef_[0])
print('A120: ', clf.coef_[1])
print('A70: ', clf.coef_[2])
print('A84: ', clf.coef_[3])
print('A81: ', clf.coef_[4])
print('A89: ', clf.coef_[5])
print('A101: ', clf.coef_[6])
print('A102: ', clf.coef_[7])
print('A105: ', clf.coef_[8])
希望您能看到我想索引我的X特征值和系数,以便我可以具体参考每个系数。我觉得他们绝对是比现在更简单的获得此结果的方法。谢谢!
答案 0 :(得分:3)
我将创建列号列表,即
col_numbers = [71, 120, 70, 84, 81, 89, 101, 102, 105]
然后从其中创建一个列表,
col_names = ['A{}: '.format(num) for num in col_numbers]
从数据框中获取这些特定的列,
df = df[col_names]
并使用for循环进行打印,
for i in range(len(col_names)):
print(col_names[i], clf.coef_[i])
答案 1 :(得分:0)
根据DataFrame
的顺序,您可以使用简单的zip高效地完成此操作:
for x in zip(df.columns, clf_coeff):
print(x[0], x[1])
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['A71: ', 'A120: ', 'A70: ', 'A84: ', 'A81: ',
'A89: ', 'A101: ', 'A102: ', 'A105: ', 'CR'])
clf_coeff = np.arange(0, 9, 1)
for x in zip(df.columns, clf_coeff):
print(x[0], x[1])
输出:
A71: 0
A120: 1
A70: 2
A84: 3
A81: 4
A89: 5
A101: 6
A102: 7
A105: 8
这里的关键是,尽管df.columns比clf_coeff长,但是一旦到达较短数组的末尾之一,zip就会停止,这意味着'CR'
的结尾并不重要。