我将serval数据帧合并在一起有问题。我下载了一些历史交易数据并将其保存到csv文件中。所以现在我想将cvs文件中的数据读入几个数据帧并提取一些接近的价格。
我创建了一个名为read_dataset的函数,它将数据读入数据帧并返回该数据帧。
结合for循环,我将所有Dataframe存储在dict中。 Dict键是货币的缩写(请参阅coin_list数据帧)。
# List of available coins, saved in a DataFrame called coin_list
coins = { 'Bitcoin': 'BTC', 'Ethereum': 'ETH', 'Ripple': 'XRP', 'BitcoinCash': 'BCH', 'Litecoin':'LTC', 'EOS': 'EOS',
'Tronix': 'TRX', 'Stellar' : 'XLM', 'Neo' : 'NEO', 'Cardano': 'ADA', 'IOTA' : 'IOT', 'Monero': 'XMR'}
# Create a coin list as Dataframe of the dictionary above
coin_list = pd.DataFrame(list(coins.items()), index = np.arange(0,12), columns=('Currency', 'Abbreviation'), dtype=str)
# Read data into DataFrames
def read_dataset (filename):
print('Reading data from %s' % filename)
file = pd.read_csv(filename)
file = file.drop('Unnamed: 0', axis=1)
return file
# Read all cryptocurrency data into a dictionary of dataframes.
currency_data = {}
df = pd.DataFrame()
for currency in coin_list['Abbreviation']:
df = read_dataset(currency + '_historical_data_daily_updated')
df = df.set_index('Timestamp')
currency_data[currency] = df
currency_data
Out:
{'ADA': close high low open volumefrom volumeto
Timestamp
2017-12-30 0.5900 0.6941 0.4200 0.4955 24118261.70 14016860.69
2017-12-31 0.7100 0.7400 0.5900 0.5900 13107255.34 8971147.70
2018-01-01 0.7022 0.7150 0.6320 0.7100 13805601.70 9403559.91
2018-01-02 0.7620 0.8000 0.6750 0.7022 8440669.40 6292466.84
因此,在创建dict currency_data之后,我想访问并分离currency_data中包含的数据帧。因此,我想创建一个for循环,以便将数据帧的所有近似价格合并为一个数据帧。
有谁知道如何实现这一目标?
我可以使用以下代码对两个数据帧执行此操作,但无法将其转换为for循环。
a = pd.DataFrame()
a['ADA closeprice'] = currency_data['ADA']['close']
b = pd.DataFrame()
b['BTC closeprice'] = currency_data['BTC']['close']
c = pd.merge(a, b, left_index=True, right_index=True)
c.drop_duplicates()
c.head()
ADA closeprice BTC closeprice
Timestamp
2017-12-30 0.5900 12531.52
2017-12-31 0.7100 13850.40
2018-01-01 0.7022 13444.88
2018-01-02 0.7620 14754.13
2018-01-03 1.1000 15156.62
或者有没有更好的方法从cvs文件创建不同的数据帧并将其存储在dict中?
感谢您的帮助!
答案 0 :(得分:1)
您不需要显式的for
循环。
您可以使用字典理解来提取系列并重命名。然后通过pd.concat
将轴数据框连接起来。
import pandas as pd
# dataframe dict
d = {'a': pd.DataFrame({'close': [1, 2, 3, 4, 5]}),
'b': pd.DataFrame({'close': [6, 7, 8, 9, 10]})}
# series dict with renaming
s = {k: v['close'].rename(k+'_close') for k, v in d.items()}
# concatenate series along axis=1
res = pd.concat(list(s.values()), axis=1)
print(res)
# a_close b_close
# 0 1 6
# 1 2 7
# 2 3 8
# 3 4 9
# 4 5 10
请注意,连接将对齐每个pd.Series
的索引。这里的索引是微不足道的(整数),但在你的情况下,它们将是pd.Timestamp
个对象。
答案 1 :(得分:0)
考虑从您的数据框字典构建一个更大的主数据框,然后按 close 这样的列名运行DataFrame.filter
:
master_df = pd.concat(currency_data, axis=1)
# RENAME COLUMNS USING itertools.product
all_cols = map(lambda x: "_".join(x), product(master_df.columns.levels[0].values,
master_df.columns.levels[1].values))
master_df.columns = all_cols
df_close = master_df.filter(regex='_close')
数据 (使用种子随机生成以获得可重复性)
import numpy as np
import pandas as pd
from itertools import product
coins = { 'Bitcoin': 'BTC', 'Ethereum': 'ETH', 'Ripple': 'XRP', 'BitcoinCash': 'BCH', 'Litecoin':'LTC', 'EOS': 'EOS',
'Tronix': 'TRX', 'Stellar' : 'XLM', 'Neo' : 'NEO', 'Cardano': 'ADA', 'IOTA' : 'IOT', 'Monero': 'XMR'}
currency_data = {}
np.random.seed(788)
for k, v in coins.items():
currency_data[v] = pd.DataFrame({'open': abs(np.random.randn(50)),
'close': abs(np.random.randn(50)),
'high': abs(np.random.randn(50)),
'low': abs(np.random.randn(50)),
'volumefrom': abs(np.random.randn(50)) * 50,
'volumeto': abs(np.random.randn(50)) * 100},
index = pd.date_range("2018-01-01", "2018-02-19", freq="D"),
columns = ['open','close','low','high','volumefrom', 'volumeto'])
<强>输出强>
print(df_close.head(10))
# ADA_close BCH_close BTC_close EOS_close ETH_close IOT_close LTC_close NEO_close TRX_close XLM_close XMR_close XRP_close
# 2018-01-01 0.650955 1.547163 0.796460 0.526820 0.191777 1.310333 0.322086 0.216098 1.231339 1.008557 1.452984 1.674484
# 2018-01-02 0.115062 0.912895 0.163012 0.962510 0.486295 0.314905 0.345002 0.148462 0.487662 0.052015 0.461620 1.673353
# 2018-01-03 1.001747 0.181435 0.439193 2.419863 0.856715 0.374709 0.277737 1.115768 0.068189 0.217582 0.501237 0.287705
# 2018-01-04 0.850843 0.194079 0.187193 0.662573 0.480762 0.488702 0.039885 0.603018 0.555557 1.136274 0.804600 0.147496
# 2018-01-05 1.195504 0.839676 0.997530 0.393851 0.606223 0.754789 1.723055 3.001308 1.601807 1.239889 0.384320 1.712975
# 2018-01-06 0.694929 0.598245 0.412835 0.694578 1.416549 0.895094 1.266500 0.168239 1.133783 0.616416 0.836242 0.654971
# 2018-01-07 0.274282 0.274834 0.760970 0.647609 2.189674 0.898377 0.932951 0.439612 1.252156 0.815973 0.051374 1.984519
# 2018-01-08 0.294268 0.786343 0.548222 2.548036 1.313609 0.348784 0.091552 0.441314 0.908229 1.175537 1.213839 1.375724
# 2018-01-09 1.383939 0.129143 0.650033 1.251369 1.064297 0.619202 1.275862 0.323824 0.083908 0.677591 0.774429 1.435533
# 2018-01-10 0.426915 1.723191 0.008422 0.650916 1.431050 0.218723 0.292402 0.030168 1.169357 0.833438 1.048405 0.270780
答案 2 :(得分:0)
我自己解决了这个问题。我的方式如下:
# Read all cryptocurrency data into a dictionary of dataframes.
currency_data = {}
df = pd.DataFrame()
for currency in coin_list['Abbreviation']:
df = read_dataset(currency + '_historical_data_daily_updated')
df = df.set_index('Timestamp')
currency_data[currency] = df
# We store all info in a dataframe with 2-level columns:
# the first level contains the coin names, the second one, the OHLC prices.
cryptocurrency_dataset = pd.concat(currency_data.values(), axis=1, keys=currency_data.keys())
'''At first we want to do some correlation analysis between cryptocurrencies'''
# Therefore we have to extract the close prices of each cryptocurrency
dataframe = {}
a = pd.DataFrame()
for i in coin_list['Abbreviation']:
a = cryptocurrency_dataset[i]['close']
dataframe[i] = a
close_prices = pd.concat(dataframe.values(), axis=1, keys=dataframe.keys())
close_prices = close_prices.dropna()