Python - 从cvs文件创建数据帧并将这些数据帧合并在一起

时间:2018-04-09 10:47:16

标签: python pandas dataframe merge

我将serval数据帧合并在一起有问题。我下载了一些历史交易数据并将其保存到csv文件中。所以现在我想将cvs文件中的数据读入几个数据帧并提取一些接近的价格。

我创建了一个名为read_dataset的函数,它将数据读入数据帧并返回该数据帧。

结合for循环,我将所有Dataframe存储在dict中。 Dict键是货币的缩写(请参阅coin_list数据帧)。

# List of available coins, saved in a DataFrame called coin_list
coins = { 'Bitcoin': 'BTC', 'Ethereum': 'ETH', 'Ripple': 'XRP', 'BitcoinCash': 'BCH', 'Litecoin':'LTC', 'EOS': 'EOS',
          'Tronix': 'TRX', 'Stellar' : 'XLM', 'Neo' : 'NEO', 'Cardano': 'ADA', 'IOTA' : 'IOT', 'Monero': 'XMR'}

# Create a coin list as Dataframe of the dictionary above
coin_list = pd.DataFrame(list(coins.items()), index = np.arange(0,12), columns=('Currency', 'Abbreviation'), dtype=str)

# Read data into DataFrames
def read_dataset (filename):
    print('Reading data from %s' % filename)
    file = pd.read_csv(filename)
    file = file.drop('Unnamed: 0', axis=1)
    return file

# Read all cryptocurrency data into a dictionary of dataframes.
currency_data = {}
df = pd.DataFrame()
for currency in coin_list['Abbreviation']:
    df = read_dataset(currency + '_historical_data_daily_updated')
    df = df.set_index('Timestamp')
    currency_data[currency] = df

currency_data
Out: 
{'ADA':      close    high     low    open   volumefrom     volumeto
 Timestamp                                                           
 2017-12-30  0.5900  0.6941  0.4200  0.4955  24118261.70  14016860.69
 2017-12-31  0.7100  0.7400  0.5900  0.5900  13107255.34   8971147.70
 2018-01-01  0.7022  0.7150  0.6320  0.7100  13805601.70   9403559.91
 2018-01-02  0.7620  0.8000  0.6750  0.7022   8440669.40   6292466.84

因此,在创建dict currency_data之后,我想访问并分离currency_data中包含的数据帧。因此,我想创建一个for循环,以便将数据帧的所有近似价格合并为一个数据帧。

有谁知道如何实现这一目标?

我可以使用以下代码对两个数据帧执行此操作,但无法将其转换为for循环。

a = pd.DataFrame()
a['ADA closeprice'] = currency_data['ADA']['close']
b = pd.DataFrame()
b['BTC closeprice'] = currency_data['BTC']['close']
c = pd.merge(a, b, left_index=True, right_index=True)
c.drop_duplicates()
c.head()

ADA closeprice  BTC closeprice
Timestamp                                 
2017-12-30          0.5900        12531.52
2017-12-31          0.7100        13850.40
2018-01-01          0.7022        13444.88
2018-01-02          0.7620        14754.13
2018-01-03          1.1000        15156.62

或者有没有更好的方法从cvs文件创建不同的数据帧并将其存储在dict中?

感谢您的帮助!

3 个答案:

答案 0 :(得分:1)

您不需要显式的for循环。

您可以使用字典理解来提取系列并重命名。然后通过pd.concat将轴数据框连接起来。

import pandas as pd

# dataframe dict
d = {'a': pd.DataFrame({'close': [1, 2, 3, 4, 5]}),
     'b': pd.DataFrame({'close': [6, 7, 8, 9, 10]})}

# series dict with renaming
s = {k: v['close'].rename(k+'_close') for k, v in d.items()}

# concatenate series along axis=1
res = pd.concat(list(s.values()), axis=1)

print(res)

#    a_close  b_close
# 0        1        6
# 1        2        7
# 2        3        8
# 3        4        9
# 4        5       10

请注意,连接将对齐每个pd.Series的索引。这里的索引是微不足道的(整数),但在你的情况下,它们将是pd.Timestamp个对象。

答案 1 :(得分:0)

考虑从您的数据框字典构建一个更大的主数据框,然后按 close 这样的列名运行DataFrame.filter

master_df = pd.concat(currency_data, axis=1)

# RENAME COLUMNS USING itertools.product
all_cols = map(lambda x: "_".join(x), product(master_df.columns.levels[0].values,
                                              master_df.columns.levels[1].values))    
master_df.columns = all_cols

df_close = master_df.filter(regex='_close')

数据 (使用种子随机生成以获得可重复性)

import numpy as np
import pandas as pd
from itertools import product

coins = { 'Bitcoin': 'BTC', 'Ethereum': 'ETH', 'Ripple': 'XRP', 'BitcoinCash': 'BCH', 'Litecoin':'LTC', 'EOS': 'EOS',
          'Tronix': 'TRX', 'Stellar' : 'XLM', 'Neo' : 'NEO', 'Cardano': 'ADA', 'IOTA' : 'IOT', 'Monero': 'XMR'}

currency_data = {}
np.random.seed(788)

for k, v in coins.items():    
    currency_data[v] = pd.DataFrame({'open': abs(np.random.randn(50)),
                                     'close': abs(np.random.randn(50)),
                                     'high': abs(np.random.randn(50)),
                                     'low': abs(np.random.randn(50)),
                                     'volumefrom': abs(np.random.randn(50)) * 50,
                                     'volumeto': abs(np.random.randn(50)) * 100},
                                     index = pd.date_range("2018-01-01", "2018-02-19", freq="D"),
                                     columns = ['open','close','low','high','volumefrom', 'volumeto'])

<强>输出

print(df_close.head(10))

#             ADA_close  BCH_close  BTC_close  EOS_close  ETH_close  IOT_close  LTC_close  NEO_close  TRX_close  XLM_close  XMR_close  XRP_close
# 2018-01-01   0.650955   1.547163   0.796460   0.526820   0.191777   1.310333   0.322086   0.216098   1.231339   1.008557   1.452984   1.674484
# 2018-01-02   0.115062   0.912895   0.163012   0.962510   0.486295   0.314905   0.345002   0.148462   0.487662   0.052015   0.461620   1.673353
# 2018-01-03   1.001747   0.181435   0.439193   2.419863   0.856715   0.374709   0.277737   1.115768   0.068189   0.217582   0.501237   0.287705
# 2018-01-04   0.850843   0.194079   0.187193   0.662573   0.480762   0.488702   0.039885   0.603018   0.555557   1.136274   0.804600   0.147496
# 2018-01-05   1.195504   0.839676   0.997530   0.393851   0.606223   0.754789   1.723055   3.001308   1.601807   1.239889   0.384320   1.712975
# 2018-01-06   0.694929   0.598245   0.412835   0.694578   1.416549   0.895094   1.266500   0.168239   1.133783   0.616416   0.836242   0.654971
# 2018-01-07   0.274282   0.274834   0.760970   0.647609   2.189674   0.898377   0.932951   0.439612   1.252156   0.815973   0.051374   1.984519
# 2018-01-08   0.294268   0.786343   0.548222   2.548036   1.313609   0.348784   0.091552   0.441314   0.908229   1.175537   1.213839   1.375724
# 2018-01-09   1.383939   0.129143   0.650033   1.251369   1.064297   0.619202   1.275862   0.323824   0.083908   0.677591   0.774429   1.435533
# 2018-01-10   0.426915   1.723191   0.008422   0.650916   1.431050   0.218723   0.292402   0.030168   1.169357   0.833438   1.048405   0.270780

答案 2 :(得分:0)

我自己解决了这个问题。我的方式如下:

# Read all cryptocurrency data into a dictionary of dataframes.
currency_data = {}
df = pd.DataFrame()
for currency in coin_list['Abbreviation']:
    df = read_dataset(currency + '_historical_data_daily_updated')
    df = df.set_index('Timestamp')
    currency_data[currency] = df

# We store all info in a dataframe with 2-level columns:
# the first level contains the coin names, the second one, the OHLC prices.
cryptocurrency_dataset = pd.concat(currency_data.values(), axis=1, keys=currency_data.keys())

'''At first we want to do some correlation analysis between cryptocurrencies'''
# Therefore we have to extract the close prices of each cryptocurrency
dataframe = {}
a = pd.DataFrame()
for i in coin_list['Abbreviation']:
    a = cryptocurrency_dataset[i]['close']
    dataframe[i] = a

close_prices = pd.concat(dataframe.values(), axis=1, keys=dataframe.keys())
close_prices = close_prices.dropna()