Python Pandas将数据帧多处理器池追加到存在的数据帧

时间:2018-01-19 14:08:13

标签: python pandas for-loop multiprocessing pool

我的数据框名为df3,5列

我正在使用多处理器池从bittrex.com解析数据帧表到称为df2的数据帧

我将进程减少到2只是为了简单我的代码作为测试

这是我的代码

import pandas as pd
import json
import urllib.request
import os
from urllib import parse
import csv
import datetime
from multiprocessing import Process, Pool
import time

df3 = pd.DataFrame(columns=['tickers', 'RSIS', 'CCIS', 'ICH', 'SMAS'])
tickers = ["BTC-1ST", "BTC-ADA"]

def http_get(url):
    result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read()}
    return result

urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ]

pool = Pool(processes=200)

results = pool.map(http_get, urls)

for result in results:
    j = json.loads(result['data'].decode())
    df2 = pd.DataFrame(data=j['result'])

    df2.rename(columns={'BV': 'BaseVolume', 'C': 'Close', 'H': 'High', 'L': 'Low', 'O': 'Open', 'T': 'TimeStamp',
                        'V': 'Volume'}, inplace=True)

    # Tenken-sen (Conversion Line): (9-period high + 9-period low)/2))
    nine_period_high = df2['High'].rolling(window=50).max()
    nine_period_low = df2['Low'].rolling(window=50).min()
    df2['tenkan_sen'] = (nine_period_high + nine_period_low) / 2

    # Kijun-sen (Base Line): (26-period high + 26-period low)/2))
    period26_high = df2['High'].rolling(window=250).max()
    period26_low = df2['Low'].rolling(window=250).min()
    df2['kijun_sen'] = (period26_high + period26_low) / 2

    TEN30L = df2.loc[df2.index[-1], 'tenkan_sen']
    TEN30LL = df2.loc[df2.index[-2], 'tenkan_sen']
    KIJ30L = df2.loc[df2.index[-1], 'kijun_sen']
    KIJ30LL = df2.loc[df2.index[-2], 'kijun_sen']

    if (TEN30LL < KIJ30LL) and (TEN30L > KIJ30L):
        df3.at[ticker, 'ICH'] = 'BUY'
    elif (TEN30LL > KIJ30LL) and (TEN30L < KIJ30L):
        df3.at[ticker, 'ICH'] = 'SELL'
    else:
        df3.at[ticker, 'ICH'] = 'NO'

    pool.close()
    pool.join()
    print(df2)

我的问题是关于我总是得到错误NameError: name 'ticker' is not defined这会让我发疯 为什么我收到此错误尽管我预先定义了自动收录器作为行urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ]中的for循环 已经python成功使用它了。

用谷歌搜索了三天,尝试了几种没有结果的解决方案。

任何想法请??? !!!!

1 个答案:

答案 0 :(得分:1)

我认为你没有看到正确的路线;当我运行你的代码时,我得到:

NameError                                 Traceback (most recent call last)
<ipython-input-1-fd766f4a9b8e> in <module>()
     49         df3.at[ticker, 'ICH'] = 'SELL'
     50     else:
---> 51         df3.at[ticker, 'ICH'] = 'NO'
     52 
     53     pool.close()

所以在第51行,而不是您创建urls列表的行。这是有道理的,因为ticker没有在该行的列表理解之外定义。问题在于您使用多处理或pandas,但由于Python范围规则:列表推导中的临时变量不能在其外部使用;很难想象它会是怎样的,因为它已经迭代了几个值,除非你只对它的最后一个值感兴趣,这不是你想要的。

您可能需要在整个提取过程中跟踪自动收报机,因此您最终可以将结果与正确的自动收报机关联起来,例如:

def http_get(ticker):
    url = "https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin"
    result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read(), "ticker": ticker}
    return result

pool = Pool(processes=200)

results = pool.map(http_get, tickers)

for result in results:
    j = json.loads(result['data'].decode())
    df2 = pd.DataFrame(data=j['result'])
    ticker = result['ticker']
    ...