在while循环熊猫中附加到数据帧

时间:2021-01-26 16:38:26

标签: python pandas

所以我在尝试对数据框进行排序时遇到了一些问题。我的代码获取一次只允许 1000 行的数据,然后它发送一个继续 URL,我的脚本跟随 while 循环,但问题是在每次传递时我都将它写入并附加到 csv。它工作正常,但现在我需要对整个数据框进行排序,这是一个问题。

我怎样才能在每次通过时写入数据帧,然后将数据帧写入 csv。我会在每个循环中附加到数据帧,还是让它在每次传递时创建新的数据帧,然后在结束时将它们组合起来?我不知道该怎么做,我几乎没有按原样完成这项工作,因此我们不胜感激。

import requests
import json
import pandas as pd
import time
import os
from  itertools import product

#what I need to loop through
instrument = ('btc-usd')
exchange = ('cbse')  
interval = ('1m','3m')  
start_time = '2021-01-14T00:00:00Z'
end_time = '2021-01-16T23:59:59Z'


for (interval) in product(interval):
    page_size = '1000'
    url = f'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/{exchange}/spot/{instrument}/aggregations/count_ohlcv_vwap'
    #params = {'interval': interval, 'page_size': page_size, 'start_time': start_time, 'end_time': end_time }
    params = {'interval': interval, 'page_size': page_size }
    KEY = 'xxx'
    headers = {
        "X-Api-Key": KEY,
        "Accept": "application/json",
        "Accept-Encoding": "gzip"
    }

    csv_file = f"{exchange}-{instrument}-{interval}.csv"
    c_token = True

    while(c_token):
        res = requests.get(url, params=params, headers=headers)
        j_data = res.json()
        parse_data = j_data['data']
        c_token = j_data.get('continuation_token')
        today = time.strftime("%Y-%m-%d")
        params = {'continuation_token': c_token}

        if c_token:   
            url = f'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/cbse/spot/btc-usd/aggregations/count_ohlcv_vwap?continuation_token={c_token}'        

        # create dataframe
        df = pd.DataFrame.from_dict(pd.json_normalize(parse_data), orient='columns')
        df.insert(1, 'time', pd.to_datetime(df.timestamp.astype(int),unit='ms'))          
        df['range'] = df['high'].astype(float) - df['low'].astype(float)
        df.range = df.range.astype(float)

        #sort
        df = df.sort_values(by='range')
        
        #that means file already exists need to append
        if(csv_file in os.listdir()): 
            csv_string = df.to_csv(index=False, encoding='utf-8', header=False)
            with open(csv_file, 'a') as f:
                f.write(csv_string)
        #that means writing file for the first time        
        else: 
            csv_string = df.to_csv(index=False, encoding='utf-8')
            with open(csv_file, 'w') as f:
                f.write(csv_string)

2 个答案:

答案 0 :(得分:1)

也许最干净、最有效的方法是创建一个空的数据帧,然后附加到它。

import requests
import json
import pandas as pd
import time
import os
from  itertools import product

#what I need to loop through
instruments = ('btc-usd',)
exchanges = ('cbse',)
intervals = ('1m','3m')  
start_time = '2021-01-14T00:00:00Z'
end_time = '2021-01-16T23:59:59Z'
params = {'page_size': 1000}
KEY = 'xxx'
    
headers = {
        "X-Api-Key": KEY,
        "Accept": "application/json",
        "Accept-Encoding": "gzip"
    }

for instrument, exchange, interval  in product(instruments, exchanges, intervals):
    params['interval'] = interval
    url = 'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/{exchange}/spot/{instrument}/aggregations/count_ohlcv_vwap'
    csv_file = f"{exchange}-{instrument}-{interval}.csv"
    df = pd.DataFrame()   # start with empty dataframe

    while True:
        res = requests.get(url, params=params, headers=headers)
        j_data = res.json()
        parse_data = j_data['data']
        df = df.append(pd.DataFrame.from_dict(pd.json_normalize(parse_data), orient='columns'))  # append to the dataframe
        if 'continuation_token' in j_data:
            params['continuation_token'] = j_data['continuation_token']
        else:
            break
        
    # These parts can be done outside of the while loop, once all the data has been compiled
    df.insert(1, 'time', pd.to_datetime(df.timestamp.astype(int),unit='ms'))          
    df['range'] = df['high'].astype(float) - df['low'].astype(float)
    df.range = df.range.astype(float)
    df = df.sort_values(by='range')
    df.to_csv(csv_file, index=False, encoding='utf-8')  # write the whole CSV at once

如果组合数据框的大小对于内存来说太大,那么您可以改为一次读取一页并将其附加到 CSV,前提是每页上的列标题都相同。 (您可能仍然需要注意 pandas 每次都以相同的顺序写入列。)

答案 1 :(得分:0)

您可以使用 df.loc 和 len 并添加值列表。

    win_results_df=pd.DataFrame(columns=['GameId','Team','TeamOpponent',\
    'HomeScore', 'VisitorScore','Target'])

   df_length = len(win_results_df)
   win_results_df.loc[df_length] = [teamOpponent['gameId'], \
   key, teamOpponent['visitorDisplayName'], \
   teamOpponent['HomeScore'], teamOpponent['VisitorScore'],True]