所以我在尝试对数据框进行排序时遇到了一些问题。我的代码获取一次只允许 1000 行的数据,然后它发送一个继续 URL,我的脚本跟随 while 循环,但问题是在每次传递时我都将它写入并附加到 csv。它工作正常,但现在我需要对整个数据框进行排序,这是一个问题。
我怎样才能在每次通过时写入数据帧,然后将数据帧写入 csv。我会在每个循环中附加到数据帧,还是让它在每次传递时创建新的数据帧,然后在结束时将它们组合起来?我不知道该怎么做,我几乎没有按原样完成这项工作,因此我们不胜感激。
import requests
import json
import pandas as pd
import time
import os
from itertools import product
#what I need to loop through
instrument = ('btc-usd')
exchange = ('cbse')
interval = ('1m','3m')
start_time = '2021-01-14T00:00:00Z'
end_time = '2021-01-16T23:59:59Z'
for (interval) in product(interval):
page_size = '1000'
url = f'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/{exchange}/spot/{instrument}/aggregations/count_ohlcv_vwap'
#params = {'interval': interval, 'page_size': page_size, 'start_time': start_time, 'end_time': end_time }
params = {'interval': interval, 'page_size': page_size }
KEY = 'xxx'
headers = {
"X-Api-Key": KEY,
"Accept": "application/json",
"Accept-Encoding": "gzip"
}
csv_file = f"{exchange}-{instrument}-{interval}.csv"
c_token = True
while(c_token):
res = requests.get(url, params=params, headers=headers)
j_data = res.json()
parse_data = j_data['data']
c_token = j_data.get('continuation_token')
today = time.strftime("%Y-%m-%d")
params = {'continuation_token': c_token}
if c_token:
url = f'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/cbse/spot/btc-usd/aggregations/count_ohlcv_vwap?continuation_token={c_token}'
# create dataframe
df = pd.DataFrame.from_dict(pd.json_normalize(parse_data), orient='columns')
df.insert(1, 'time', pd.to_datetime(df.timestamp.astype(int),unit='ms'))
df['range'] = df['high'].astype(float) - df['low'].astype(float)
df.range = df.range.astype(float)
#sort
df = df.sort_values(by='range')
#that means file already exists need to append
if(csv_file in os.listdir()):
csv_string = df.to_csv(index=False, encoding='utf-8', header=False)
with open(csv_file, 'a') as f:
f.write(csv_string)
#that means writing file for the first time
else:
csv_string = df.to_csv(index=False, encoding='utf-8')
with open(csv_file, 'w') as f:
f.write(csv_string)
答案 0 :(得分:1)
也许最干净、最有效的方法是创建一个空的数据帧,然后附加到它。
import requests
import json
import pandas as pd
import time
import os
from itertools import product
#what I need to loop through
instruments = ('btc-usd',)
exchanges = ('cbse',)
intervals = ('1m','3m')
start_time = '2021-01-14T00:00:00Z'
end_time = '2021-01-16T23:59:59Z'
params = {'page_size': 1000}
KEY = 'xxx'
headers = {
"X-Api-Key": KEY,
"Accept": "application/json",
"Accept-Encoding": "gzip"
}
for instrument, exchange, interval in product(instruments, exchanges, intervals):
params['interval'] = interval
url = 'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/{exchange}/spot/{instrument}/aggregations/count_ohlcv_vwap'
csv_file = f"{exchange}-{instrument}-{interval}.csv"
df = pd.DataFrame() # start with empty dataframe
while True:
res = requests.get(url, params=params, headers=headers)
j_data = res.json()
parse_data = j_data['data']
df = df.append(pd.DataFrame.from_dict(pd.json_normalize(parse_data), orient='columns')) # append to the dataframe
if 'continuation_token' in j_data:
params['continuation_token'] = j_data['continuation_token']
else:
break
# These parts can be done outside of the while loop, once all the data has been compiled
df.insert(1, 'time', pd.to_datetime(df.timestamp.astype(int),unit='ms'))
df['range'] = df['high'].astype(float) - df['low'].astype(float)
df.range = df.range.astype(float)
df = df.sort_values(by='range')
df.to_csv(csv_file, index=False, encoding='utf-8') # write the whole CSV at once
如果组合数据框的大小对于内存来说太大,那么您可以改为一次读取一页并将其附加到 CSV,前提是每页上的列标题都相同。 (您可能仍然需要注意 pandas 每次都以相同的顺序写入列。)
答案 1 :(得分:0)
您可以使用 df.loc 和 len 并添加值列表。
win_results_df=pd.DataFrame(columns=['GameId','Team','TeamOpponent',\
'HomeScore', 'VisitorScore','Target'])
df_length = len(win_results_df)
win_results_df.loc[df_length] = [teamOpponent['gameId'], \
key, teamOpponent['visitorDisplayName'], \
teamOpponent['HomeScore'], teamOpponent['VisitorScore'],True]