Jupyter笔记本,使用数据进行机器学习

时间:2018-09-17 17:09:52

标签: python jupyter-notebook

我对使用Jupyter Notebook很陌生。我总体上喜欢它,尽管有时会出现一些奇怪的错误,有时会出现,而有时却不会。例如,我有一个看起来像这样的数据集(显示.head()):

enter image description here

现在,如果我设置说volume = data [“ avg_volume”],然后说volume.head(),我会得到:

enter image description here

但是可以说我删除该行并将其放置在其他地方,有时会出现此错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-9c1c4c11ebf0> in <module>()
----> 1 volume = data["avg_volume"]
      2 volume.head()

TypeError: 'float' object is not subscriptable

我注意到在此行之后执行此操作:

pnl = data["MTM_pnl"]
for data in pnl:
    if(data > 0):
        profit = np.sum(data)
print(profit)

将导致问题。我只是不明白为什么这样做,这对我来说没有意义,并让我相信使用jupyter笔记本是垃圾。 这是代码:

# coding: utf-8

# In[1]:


# import modules
import numpy as np
import pandas as pd
import torch
import matplotlib.pyplot as plt
import tensorflow as tf


# In[2]:


# import dataset
data = pd.read_csv('output.csv')
data.head()


# In[3]:


# Goal with data set: The goal is to maximize the PNL column, secondary goals are to minimize MAE (Maximum Adverse Excursion)
# and maximize MFE (Maximum Favorable Excursion). Once a predictable model is established the next step is to work on adding
# alpha by optimizing the stop/take profit logic.
# Assumptions: The thesis is that an earning stock (a stock that has published an earnings report in the past 24 hours) 
# that gaps on open, continues in the direction of the gap.


# In[4]:


# Get statistical information
data.describe()


# In[5]:


# See how correlated each variable is to MTM_pnl
data.corr(method='pearson', min_periods=1)


# In[6]:


# create some histograms
data[data.dtypes[(data.dtypes=="float64")|(data.dtypes=="int64")]
                        .index.values].hist(figsize=[11,11])


# In[7]:


# def maximize_profit(data):
#     LIR = data["LIR"]
#     volume = data["avg_volume"]
#     earnings = data["earning_time"]
volume = data["avg_volume"]
volume.head()


# In[8]:


pnl = data["MTM_pnl"]
for data in pnl:
    if(data > 0):
        profit = np.sum(data)
print(profit)


# In[9]:


volume = data["avg_volume"]
volume.head()

可以找到数据集here。没有github存储库本身是不相关的,但是它是我必须访问数据集的第一个想法。

1 个答案:

答案 0 :(得分:2)

在代码for data in pnl中,您重新定义了变量data,因此它不再是DataFrame,并且不能通过列名进行索引。

顺便说一句,当您尝试生成一个最小,完整,可验证的示例时,会发现许多类似的错误。您会注意到,删除for循环时,此错误已消失。