Python-如何更有效地索引和打印

时间:2018-08-09 17:56:13

标签: python pandas numpy scikit-learn

我仍在学习Python,并且对我的代码的效率和可读性有一个快速的疑问。

目前我有这个

import pandas as pd
from sklearn.linear_model import Lasso
import numpy as np

df=pd.read_csv('Data\\cmd.csv')


df=df[['A71: ','A120: ',\
'A70: ','A84: ','A81: ','A89: ',\
'A101: ','A102: ','A105: ','CR']]

X=np.array(df[['A71: ','A120: ',\
'A70: ','A84: ','A81: ','A89: ',\
'A101: ','A102: ','A105: ']])

y=np.array(df[['CR']])

clf=Lasso()
clf.fit(X,y)


print('A71: ', clf.coef_[0])
print('A120: ', clf.coef_[1])    
print('A70: ', clf.coef_[2])
print('A84: ', clf.coef_[3])
print('A81: ', clf.coef_[4])
print('A89: ', clf.coef_[5])
print('A101: ', clf.coef_[6])
print('A102: ', clf.coef_[7])
print('A105: ', clf.coef_[8])

希望您能看到我想索引我的X特征值和系数,以便我可以具体参考每个系数。我觉得他们绝对是比现在更简单的获得此结果的方法。谢谢!

2 个答案:

答案 0 :(得分:3)

我将创建列号列表,即

col_numbers = [71, 120, 70, 84, 81, 89, 101, 102, 105]

然后从其中创建一个列表,

col_names = ['A{}: '.format(num) for num in col_numbers]

从数据框中获取这些特定的列,

df = df[col_names]

并使用for循环进行打印,

for i in range(len(col_names)):
    print(col_names[i], clf.coef_[i])

答案 1 :(得分:0)

根据DataFrame的顺序,您可以使用简单的zip高效地完成此操作:

for x in zip(df.columns, clf_coeff):
    print(x[0], x[1])

示例:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['A71: ', 'A120: ', 'A70: ', 'A84: ', 'A81: ', 
                             'A89: ', 'A101: ', 'A102: ', 'A105: ', 'CR'])
clf_coeff = np.arange(0, 9, 1)

for x in zip(df.columns, clf_coeff):
    print(x[0], x[1])

输出:

A71:  0
A120:  1
A70:  2
A84:  3
A81:  4
A89:  5
A101:  6
A102:  7
A105:  8

这里的关键是,尽管df.columns比clf_coeff长,但是一旦到达较短数组的末尾之一,zip就会停止,这意味着'CR'的结尾并不重要。