我刚开始学习机器学习和Scikit。我一直在观看一个教程,其中该人使用Quandl来获取谷歌股票价格的数据。据我所研究,Quandl.get返回pandas数据帧。令我对这个数据帧感到困惑的是,一段代码是在数据帧的第二维中添加列,而在另一行上,教师使用数据帧的第一维访问同一列。怎么可能?这个数据框发生了什么?
df = quandl.get('WIKI/GOOGL')
df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]
df['HCL_PCT'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] # how is df['Adj. Open'] working?? Wasn't 'Adj. Open' added in the second dimension of the dataframe in the second line of the code above??
我的目标是学习Tensorflow,并在深入TensorFlow之前对机器学习俚语和概念有一点了解。
答案 0 :(得分:0)
我添加df.head()
来写输出以显示数据:
#read data
df = quandl.get('WIKI/GOOGL')
print (df.head())
Open High Low Close Volume Ex-Dividend \
Date
2004-08-19 100.01 104.06 95.96 100.335 44659000.0 0.0
2004-08-20 101.01 109.08 100.50 108.310 22834300.0 0.0
2004-08-23 110.76 113.48 109.05 109.400 18256100.0 0.0
2004-08-24 111.24 111.60 103.57 104.870 15247300.0 0.0
2004-08-25 104.76 108.00 103.88 106.000 9188600.0 0.0
Split Ratio Adj. Open Adj. High Adj. Low Adj. Close \
Date
2004-08-19 1.0 50.159839 52.191109 48.128568 50.322842
2004-08-20 1.0 50.661387 54.708881 50.405597 54.322689
2004-08-23 1.0 55.551482 56.915693 54.693835 54.869377
2004-08-24 1.0 55.792225 55.972783 51.945350 52.597363
2004-08-25 1.0 52.542193 54.167209 52.100830 53.164113
Adj. Volume
Date
2004-08-19 44659000.0
2004-08-20 22834300.0
2004-08-23 18256100.0
2004-08-24 15247300.0
2004-08-25 9188600.0
#select data by columns (filter) and set order of columns
df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]
print (df.head())
Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
Date
2004-08-19 50.159839 52.191109 48.128568 50.322842 44659000.0
2004-08-20 50.661387 54.708881 50.405597 54.322689 22834300.0
2004-08-23 55.551482 56.915693 54.693835 54.869377 18256100.0
2004-08-24 55.792225 55.972783 51.945350 52.597363 15247300.0
2004-08-25 52.542193 54.167209 52.100830 53.164113 9188600.0
#count data - select by columns
df['HCL_PCT'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open']
print (df.head())
Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume HCL_PCT
Date
2004-08-19 50.159839 52.191109 48.128568 50.322842 44659000.0 0.003250
2004-08-20 50.661387 54.708881 50.405597 54.322689 22834300.0 0.072270
2004-08-23 55.551482 56.915693 54.693835 54.869377 18256100.0 -0.012279
2004-08-24 55.792225 55.972783 51.945350 52.597363 15247300.0 -0.057264
2004-08-25 52.542193 54.167209 52.100830 53.164113 9188600.0 0.011837
选择列Adj. Close
:
print (df['Adj. Close'])
Date
2004-08-19 50.322842
2004-08-20 54.322689
2004-08-23 54.869377
2004-08-24 52.597363
2004-08-25 53.164113
2004-08-26 54.122070
2004-08-27 53.239345
2004-08-30 51.162935
2004-08-31 51.343492
2004-09-01 50.280210
2004-09-02 50.912161
2004-09-03 50.159839
2004-09-07 50.947269
2004-09-08 51.308384
2004-09-09 51.313400
2004-09-10 52.828075
2004-09-13 53.916435
2004-09-14 55.917612
2004-09-15 56.173402
2004-09-16 57.161452
2004-09-17 58.926902
2004-09-20 59.864797
2004-09-21 59.102444
2004-09-22 59.373280
2004-09-23 60.597057
2004-09-24 60.100525
2004-09-27 59.313094
2004-09-28 63.626409
2004-09-29 65.742942
2004-09-30 65.000651
2017-04-13 840.180000
2017-04-17 855.130000
2017-04-18 853.990000
2017-04-19 856.510000
2017-04-20 860.080000
2017-04-21 858.950000
2017-04-24 878.930000
2017-04-25 888.840000
2017-04-26 889.140000
2017-04-27 891.440000
2017-04-28 924.520000
2017-05-01 932.820000
2017-05-02 937.090000
2017-05-03 948.450000
2017-05-04 954.720000
2017-05-05 950.280000
2017-05-08 958.690000
2017-05-09 956.710000
2017-05-10 954.840000
2017-05-11 955.890000
2017-05-12 955.140000
2017-05-15 959.220000
2017-05-16 964.610000
2017-05-17 942.170000
2017-05-18 950.500000
2017-05-19 954.650000
2017-05-22 964.070000
2017-05-23 970.550000
2017-05-24 977.610000
2017-05-25 991.860000
Name: Adj. Close, Length: 3215, dtype: float64
编辑:
df = pd.DataFrame({'A':[1,2,3],
'D':[4,5,6],
'B':[7,8,9],
'F':[1,3,5],
'C':[5,3,6]})
print (df)
A B C D F
0 1 7 5 4 1
1 2 8 3 5 3
2 3 9 6 6 5
#select only columns A,B,C and return new dataframe in new order of columns
df1 = df[['A','B','C']]
print (df1)
A B C
0 1 7 5
1 2 8 3
2 3 9 6
#select only columns A,B,C and return new dataframe in new order of columns
df2 = df[['C','A','B']]
print (df2)
C A B
0 5 1 7
1 3 2 8
2 6 3 9
答案 1 :(得分:0)
索引:索引或类似数组
在Dataframe结构中,使用索引获取列,使用数组或多个队列,相当于df [:,[]](所有选中元素,列元素切片访问)