Python Pandas - 无法合并返回NaN的多个DataFrame

时间:2016-05-24 09:06:13

标签: python csv pandas merge

我正在尝试将多个CSV文件合并为1个大型数据帧。我想将它们与日期列的方法合并。虽然某些CSV文件缺少日期,但需要记录空白或NA。

四处搜索让我相信python中的大熊猫是一个可行的解决方案。

我的代码如下:

import pandas as pd

AvgPrice = pd.read_csv('csv/BAVERAGE-USD-Bitcoin24hPrice.csv', index_col=False)
AvgPrice = AvgPrice.iloc[:,(0,1)]
AvgPrice.columns.values[1] = 'Price'

TransVol = pd.read_csv('csv/BCHAIN-ETRAV-BitcoinEstimatedTransactionVolume.csv', index_col=False)
TransVol.columns.values[1] = 'TransactionVolume'

TotalBTC = pd.read_csv('csv/BCHAIN-TOTBC-TotalBitcoins.csv', index_col=False)
TotalBTC.columns.values[1] = 'TotalBTC'

USDExchVol = pd.read_csv('csv/BCHAIN-TRVOU-BitcoinUSDExchangeTradeVolume.csv', index_col=False)
USDExchVol.columns.values[1] = 'USDExchange Volume'

df1 = pd.merge(TransVol, AvgPrice, on='Date', how='outer')
df2 = pd.merge(USDExchVol, TotalBTC, on='Date', how='outer)

df_test = pd.merge(AvgPrice, TransVol, on='Date', how='outer')

CSV文件位于此处:https://drive.google.com/folderview?id=0B8xdmDmZgtJbVkhCcjZkZUhaajg&usp=sharing

df_test的结果:

            Date   Price  TransactionVolume
0     2016-05-10  459.30                NaN
1     2016-05-09  462.49                NaN
2     2016-05-08  461.85                NaN
3     2016-05-07  460.86                NaN
4     2016-05-06  453.51                NaN
5     2016-05-05  449.31                NaN

而df1似乎很好:

            Date  TransactionVolume   Price
0     2016-05-10           275352.0  459.30
1     2016-05-09           256585.0  462.49
2     2016-05-08           152045.0  461.85
3     2016-05-07           245115.0  460.86
4     2016-05-06           264882.0  453.51
5     2016-05-05           273005.0  449.31

我不知道为什么df2和df_test的最右边的列填充了NaN。这限制了我将df1和df2合并为一个大型DataFrame。

任何帮助都会非常感激,因为我花了好几个小时没有成功。

2 个答案:

答案 0 :(得分:0)

您必须将参数namesusecols添加到read_csv,然后才能正常运行:

import pandas as pd

AvgPrice = pd.read_csv('csv/BAVERAGE-USD-Bitcoin24hPrice.csv', 
                       index_col=False, 
                       parse_dates=['Date'],
                       usecols=[0,1],
                       header=0, 
                       names=['Date','Price'])

TransVol = pd.read_csv('csv/BCHAIN-ETRAV-BitcoinEstimatedTransactionVolume.csv', 
                       index_col=False, 
                       parse_dates=['Date'],
                       header=0, 
                       names=['Date','TransactionVolume'])


TotalBTC = pd.read_csv('csv/BCHAIN-TOTBC-TotalBitcoins.csv', 
                       index_col=False, 
                       parse_dates=['Date'],
                       header=0, 
                       names=['Date','TotalBTC'])


USDExchVol = pd.read_csv('csv/BCHAIN-TRVOU-BitcoinUSDExchangeTradeVolume.csv', 
                       index_col=False,
                       parse_dates=['Date'],
                       header=0, 
                       names=['Date','USDExchange Volume'])
df1 = pd.merge(TransVol, AvgPrice, on='Date', how='outer')
df2 = pd.merge(USDExchVol, TotalBTC, on='Date', how='outer')
df_test = pd.merge(AvgPrice, TransVol, on='Date', how='outer')

print (df1.head())
print (df2.head())
print (df_test.head())
        Date  TransactionVolume   Price
0 2016-05-10           275352.0  459.30
1 2016-05-09           256585.0  462.49
2 2016-05-08           152045.0  461.85
3 2016-05-07           245115.0  460.86
4 2016-05-06           264882.0  453.51
        Date  USDExchange Volume    TotalBTC
0 2016-05-10        2.158373e+06  15529625.0
1 2016-05-09        1.438420e+06  15525825.0
2 2016-05-08        6.679933e+05  15521275.0
3 2016-05-07        1.825475e+06  15517400.0
4 2016-05-06        1.908048e+06  15513525.0
        Date   Price  TransactionVolume
0 2016-05-10  459.30           275352.0
1 2016-05-09  462.49           256585.0
2 2016-05-08  461.85           152045.0
3 2016-05-07  460.86           245115.0
4 2016-05-06  453.51           264882.0

通过评论编辑:

我认为您可以转换Date的{​​{1}} to_period列,然后将groupbymean一起使用:

months

如果订单很重要,请添加参数print (df1.Date.dt.to_period('M')) 0 2016-05 1 2016-05 2 2016-05 3 2016-05 4 2016-05 5 2016-05 6 2016-05 7 2016-05 ... ... print (df1.groupby( df1.Date.dt.to_period('M') ).mean() ) TransactionVolume Price Date 2011-05 1.605518e+05 7.272273 2011-06 1.739163e+05 17.914583 2011-07 6.647129e+04 14.100645 2011-08 1.050460e+05 10.089677 2011-09 9.562243e+04 5.933667 2011-10 9.120232e+04 3.638065 2011-11 8.927442e+05 2.690333 2011-12 1.092328e+06 3.463871 2012-01 1.168704e+05 6.105161 2012-02 1.465859e+05 5.115517 ... ...

sort=False

答案 1 :(得分:0)

这里有一个微妙的错误,你通过直接分配给每个df中的列数组来重命名列:

AvgPrice.columns.values[1] = 'Price'

如果您尝试TransVol.info(),则会在KeyError上提出TransactionVolume

如果您使用rename,那么它可以工作:

In [35]:
AvgPrice = pd.read_csv(r'c:\data\BAVERAGE-USD-Bitcoin24hPrice.csv', index_col=False)
AvgPrice = AvgPrice.iloc[:,(0,1)]
AvgPrice.rename(columns={'24h Average':'Price'}, inplace=True)
​
TransVol = pd.read_csv(r'c:\data\BCHAIN-ETRAV-BitcoinEstimatedTransactionVolume.csv', index_col=False)
TransVol.rename(columns={'Value':'TransactionVolume'}, inplace=True)
​
TotalBTC = pd.read_csv(r'c:\data\BCHAIN-TOTBC-TotalBitcoins.csv', index_col=False)
TotalBTC.rename(columns={'Value':'TotalBTC'}, inplace=True)
​
USDExchVol = pd.read_csv(r'c:\data\BCHAIN-TRVOU-BitcoinUSDExchangeTradeVolume.csv', index_col=False)
USDExchVol.rename(columns={'Value':'USDExchange Volume'}, inplace=True)
​
df1 = pd.merge(TransVol, AvgPrice, on='Date', how='outer')
df2 = pd.merge(USDExchVol, TotalBTC, on='Date', how='outer')
​
df_test = pd.merge(AvgPrice, TransVol, on='Date', how='outer')
df_test
Out[35]:
            Date   Price  TransactionVolume
0     2016-05-10  459.30           275352.0
1     2016-05-09  462.49           256585.0
2     2016-05-08  461.85           152045.0
3     2016-05-07  460.86           245115.0
4     2016-05-06  453.51           264882.0
5     2016-05-05  449.31           273005.0
6     2016-05-04  449.32           370911.0
7     2016-05-03  447.93           252534.0
8     2016-05-02  448.00           249926.0
9     2016-05-01  452.87           170791.0
10    2016-04-30  454.88           190470.0
11    2016-04-29  451.88           278893.0
12    2016-04-28  445.80           329924.0
13    2016-04-27  461.92           335750.0
14    2016-04-26  465.91           344162.0
15    2016-04-25  460.32           307790.0
16    2016-04-24  455.53           188499.0
17    2016-04-23  449.13           203792.0
18    2016-04-22  447.73           291487.0
19    2016-04-21  445.28           316159.0
20    2016-04-20  438.98           302380.0
21    2016-04-19  432.35           275994.0
22    2016-04-18  429.76           245313.0
23    2016-04-17  431.93           186607.0
24    2016-04-16  432.86           200628.0
25    2016-04-15  429.06           281389.0
26    2016-04-14  426.21           274524.0
27    2016-04-13  425.50           309995.0
28    2016-04-12  426.15           341372.0
29    2016-04-11  422.91           264357.0
...          ...     ...                ...
1798  2011-05-18    7.14            80290.0
1799  2011-05-17    7.52           138205.0
1800  2011-05-16    7.77            62341.0
1801  2011-05-15    6.74           272130.0
1802  2011-05-14    7.86           656162.0
1803  2011-05-13    7.48           324020.0
1804  2011-05-12    5.83           101674.0
1805  2011-05-11    5.35           114243.0
1806  2011-05-10    4.74           104592.0
1807  2015-09-03     NaN           256023.0
1808  2015-02-03     NaN           213538.0
1809  2015-01-07     NaN           256344.0
1810  2014-11-21     NaN           161082.0
1811  2014-10-17     NaN           142251.0
1812  2014-09-28     NaN            92933.0
1813  2014-09-09     NaN           111317.0
1814  2014-08-05     NaN           136298.0
1815  2014-08-03     NaN            49181.0
1816  2014-08-01     NaN           166173.0
1817  2014-06-03     NaN           124768.0
1818  2014-06-02     NaN            87513.0
1819  2014-05-09     NaN            80315.0
1820  2013-10-27     NaN           107717.0
1821  2013-09-17     NaN           137920.0
1822  2011-06-25     NaN           110463.0
1823  2011-06-24     NaN           106146.0
1824  2011-06-23     NaN           475995.0
1825  2011-06-22     NaN           122507.0
1826  2011-06-21     NaN           114264.0
1827  2011-06-20     NaN           836861.0

[1828 rows x 3 columns]