Question

我对使用Jupyter Notebook很陌生。我总体上喜欢它，尽管有时会出现一些奇怪的错误，有时会出现，而有时却不会。例如，我有一个看起来像这样的数据集（显示.head（））：

现在，如果我设置说volume = data [“ avg_volume”]，然后说volume.head（），我会得到：

但是可以说我删除该行并将其放置在其他地方，有时会出现此错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-9c1c4c11ebf0> in <module>()
----> 1 volume = data["avg_volume"]
      2 volume.head()

TypeError: 'float' object is not subscriptable

我注意到在此行之后执行此操作：

pnl = data["MTM_pnl"]
for data in pnl:
    if(data > 0):
        profit = np.sum(data)
print(profit)

将导致问题。我只是不明白为什么这样做，这对我来说没有意义，并让我相信使用jupyter笔记本是垃圾。这是代码：

# coding: utf-8

# In[1]:


# import modules
import numpy as np
import pandas as pd
import torch
import matplotlib.pyplot as plt
import tensorflow as tf


# In[2]:


# import dataset
data = pd.read_csv('output.csv')
data.head()


# In[3]:


# Goal with data set: The goal is to maximize the PNL column, secondary goals are to minimize MAE (Maximum Adverse Excursion)
# and maximize MFE (Maximum Favorable Excursion). Once a predictable model is established the next step is to work on adding
# alpha by optimizing the stop/take profit logic.
# Assumptions: The thesis is that an earning stock (a stock that has published an earnings report in the past 24 hours) 
# that gaps on open, continues in the direction of the gap.


# In[4]:


# Get statistical information
data.describe()


# In[5]:


# See how correlated each variable is to MTM_pnl
data.corr(method='pearson', min_periods=1)


# In[6]:


# create some histograms
data[data.dtypes[(data.dtypes=="float64")|(data.dtypes=="int64")]
                        .index.values].hist(figsize=[11,11])


# In[7]:


# def maximize_profit(data):
#     LIR = data["LIR"]
#     volume = data["avg_volume"]
#     earnings = data["earning_time"]
volume = data["avg_volume"]
volume.head()


# In[8]:


pnl = data["MTM_pnl"]
for data in pnl:
    if(data > 0):
        profit = np.sum(data)
print(profit)


# In[9]:


volume = data["avg_volume"]
volume.head()

可以找到数据集here。没有github存储库本身是不相关的，但是它是我必须访问数据集的第一个想法。

Answer 1

在代码for data in pnl中，您重新定义了变量data，因此它不再是DataFrame，并且不能通过列名进行索引。

顺便说一句，当您尝试生成一个最小，完整，可验证的示例时，会发现许多类似的错误。您会注意到，删除for循环时，此错误已消失。

Jupyter笔记本，使用数据进行机器学习

1 个答案: