我正在尝试将多个CSV文件合并为1个大型数据帧。我想将它们与日期列的方法合并。虽然某些CSV文件缺少日期,但需要记录空白或NA。
四处搜索让我相信python中的大熊猫是一个可行的解决方案。
我的代码如下:
import pandas as pd
AvgPrice = pd.read_csv('csv/BAVERAGE-USD-Bitcoin24hPrice.csv', index_col=False)
AvgPrice = AvgPrice.iloc[:,(0,1)]
AvgPrice.columns.values[1] = 'Price'
TransVol = pd.read_csv('csv/BCHAIN-ETRAV-BitcoinEstimatedTransactionVolume.csv', index_col=False)
TransVol.columns.values[1] = 'TransactionVolume'
TotalBTC = pd.read_csv('csv/BCHAIN-TOTBC-TotalBitcoins.csv', index_col=False)
TotalBTC.columns.values[1] = 'TotalBTC'
USDExchVol = pd.read_csv('csv/BCHAIN-TRVOU-BitcoinUSDExchangeTradeVolume.csv', index_col=False)
USDExchVol.columns.values[1] = 'USDExchange Volume'
df1 = pd.merge(TransVol, AvgPrice, on='Date', how='outer')
df2 = pd.merge(USDExchVol, TotalBTC, on='Date', how='outer)
df_test = pd.merge(AvgPrice, TransVol, on='Date', how='outer')
CSV文件位于此处:https://drive.google.com/folderview?id=0B8xdmDmZgtJbVkhCcjZkZUhaajg&usp=sharing
df_test的结果:
Date Price TransactionVolume
0 2016-05-10 459.30 NaN
1 2016-05-09 462.49 NaN
2 2016-05-08 461.85 NaN
3 2016-05-07 460.86 NaN
4 2016-05-06 453.51 NaN
5 2016-05-05 449.31 NaN
而df1似乎很好:
Date TransactionVolume Price
0 2016-05-10 275352.0 459.30
1 2016-05-09 256585.0 462.49
2 2016-05-08 152045.0 461.85
3 2016-05-07 245115.0 460.86
4 2016-05-06 264882.0 453.51
5 2016-05-05 273005.0 449.31
我不知道为什么df2和df_test的最右边的列填充了NaN。这限制了我将df1和df2合并为一个大型DataFrame。
任何帮助都会非常感激,因为我花了好几个小时没有成功。
答案 0 :(得分:0)
您必须将参数names
和usecols
添加到read_csv
,然后才能正常运行:
import pandas as pd
AvgPrice = pd.read_csv('csv/BAVERAGE-USD-Bitcoin24hPrice.csv',
index_col=False,
parse_dates=['Date'],
usecols=[0,1],
header=0,
names=['Date','Price'])
TransVol = pd.read_csv('csv/BCHAIN-ETRAV-BitcoinEstimatedTransactionVolume.csv',
index_col=False,
parse_dates=['Date'],
header=0,
names=['Date','TransactionVolume'])
TotalBTC = pd.read_csv('csv/BCHAIN-TOTBC-TotalBitcoins.csv',
index_col=False,
parse_dates=['Date'],
header=0,
names=['Date','TotalBTC'])
USDExchVol = pd.read_csv('csv/BCHAIN-TRVOU-BitcoinUSDExchangeTradeVolume.csv',
index_col=False,
parse_dates=['Date'],
header=0,
names=['Date','USDExchange Volume'])
df1 = pd.merge(TransVol, AvgPrice, on='Date', how='outer')
df2 = pd.merge(USDExchVol, TotalBTC, on='Date', how='outer')
df_test = pd.merge(AvgPrice, TransVol, on='Date', how='outer')
print (df1.head())
print (df2.head())
print (df_test.head())
Date TransactionVolume Price
0 2016-05-10 275352.0 459.30
1 2016-05-09 256585.0 462.49
2 2016-05-08 152045.0 461.85
3 2016-05-07 245115.0 460.86
4 2016-05-06 264882.0 453.51
Date USDExchange Volume TotalBTC
0 2016-05-10 2.158373e+06 15529625.0
1 2016-05-09 1.438420e+06 15525825.0
2 2016-05-08 6.679933e+05 15521275.0
3 2016-05-07 1.825475e+06 15517400.0
4 2016-05-06 1.908048e+06 15513525.0
Date Price TransactionVolume
0 2016-05-10 459.30 275352.0
1 2016-05-09 462.49 256585.0
2 2016-05-08 461.85 152045.0
3 2016-05-07 460.86 245115.0
4 2016-05-06 453.51 264882.0
通过评论编辑:
我认为您可以转换Date
的{{1}} to_period
列,然后将groupby
与mean
一起使用:
months
如果订单很重要,请添加参数print (df1.Date.dt.to_period('M'))
0 2016-05
1 2016-05
2 2016-05
3 2016-05
4 2016-05
5 2016-05
6 2016-05
7 2016-05
...
...
print (df1.groupby( df1.Date.dt.to_period('M') ).mean() )
TransactionVolume Price
Date
2011-05 1.605518e+05 7.272273
2011-06 1.739163e+05 17.914583
2011-07 6.647129e+04 14.100645
2011-08 1.050460e+05 10.089677
2011-09 9.562243e+04 5.933667
2011-10 9.120232e+04 3.638065
2011-11 8.927442e+05 2.690333
2011-12 1.092328e+06 3.463871
2012-01 1.168704e+05 6.105161
2012-02 1.465859e+05 5.115517
...
...
:
sort=False
答案 1 :(得分:0)
这里有一个微妙的错误,你通过直接分配给每个df中的列数组来重命名列:
AvgPrice.columns.values[1] = 'Price'
如果您尝试TransVol.info()
,则会在KeyError
上提出TransactionVolume
如果您使用rename
,那么它可以工作:
In [35]:
AvgPrice = pd.read_csv(r'c:\data\BAVERAGE-USD-Bitcoin24hPrice.csv', index_col=False)
AvgPrice = AvgPrice.iloc[:,(0,1)]
AvgPrice.rename(columns={'24h Average':'Price'}, inplace=True)
TransVol = pd.read_csv(r'c:\data\BCHAIN-ETRAV-BitcoinEstimatedTransactionVolume.csv', index_col=False)
TransVol.rename(columns={'Value':'TransactionVolume'}, inplace=True)
TotalBTC = pd.read_csv(r'c:\data\BCHAIN-TOTBC-TotalBitcoins.csv', index_col=False)
TotalBTC.rename(columns={'Value':'TotalBTC'}, inplace=True)
USDExchVol = pd.read_csv(r'c:\data\BCHAIN-TRVOU-BitcoinUSDExchangeTradeVolume.csv', index_col=False)
USDExchVol.rename(columns={'Value':'USDExchange Volume'}, inplace=True)
df1 = pd.merge(TransVol, AvgPrice, on='Date', how='outer')
df2 = pd.merge(USDExchVol, TotalBTC, on='Date', how='outer')
df_test = pd.merge(AvgPrice, TransVol, on='Date', how='outer')
df_test
Out[35]:
Date Price TransactionVolume
0 2016-05-10 459.30 275352.0
1 2016-05-09 462.49 256585.0
2 2016-05-08 461.85 152045.0
3 2016-05-07 460.86 245115.0
4 2016-05-06 453.51 264882.0
5 2016-05-05 449.31 273005.0
6 2016-05-04 449.32 370911.0
7 2016-05-03 447.93 252534.0
8 2016-05-02 448.00 249926.0
9 2016-05-01 452.87 170791.0
10 2016-04-30 454.88 190470.0
11 2016-04-29 451.88 278893.0
12 2016-04-28 445.80 329924.0
13 2016-04-27 461.92 335750.0
14 2016-04-26 465.91 344162.0
15 2016-04-25 460.32 307790.0
16 2016-04-24 455.53 188499.0
17 2016-04-23 449.13 203792.0
18 2016-04-22 447.73 291487.0
19 2016-04-21 445.28 316159.0
20 2016-04-20 438.98 302380.0
21 2016-04-19 432.35 275994.0
22 2016-04-18 429.76 245313.0
23 2016-04-17 431.93 186607.0
24 2016-04-16 432.86 200628.0
25 2016-04-15 429.06 281389.0
26 2016-04-14 426.21 274524.0
27 2016-04-13 425.50 309995.0
28 2016-04-12 426.15 341372.0
29 2016-04-11 422.91 264357.0
... ... ... ...
1798 2011-05-18 7.14 80290.0
1799 2011-05-17 7.52 138205.0
1800 2011-05-16 7.77 62341.0
1801 2011-05-15 6.74 272130.0
1802 2011-05-14 7.86 656162.0
1803 2011-05-13 7.48 324020.0
1804 2011-05-12 5.83 101674.0
1805 2011-05-11 5.35 114243.0
1806 2011-05-10 4.74 104592.0
1807 2015-09-03 NaN 256023.0
1808 2015-02-03 NaN 213538.0
1809 2015-01-07 NaN 256344.0
1810 2014-11-21 NaN 161082.0
1811 2014-10-17 NaN 142251.0
1812 2014-09-28 NaN 92933.0
1813 2014-09-09 NaN 111317.0
1814 2014-08-05 NaN 136298.0
1815 2014-08-03 NaN 49181.0
1816 2014-08-01 NaN 166173.0
1817 2014-06-03 NaN 124768.0
1818 2014-06-02 NaN 87513.0
1819 2014-05-09 NaN 80315.0
1820 2013-10-27 NaN 107717.0
1821 2013-09-17 NaN 137920.0
1822 2011-06-25 NaN 110463.0
1823 2011-06-24 NaN 106146.0
1824 2011-06-23 NaN 475995.0
1825 2011-06-22 NaN 122507.0
1826 2011-06-21 NaN 114264.0
1827 2011-06-20 NaN 836861.0
[1828 rows x 3 columns]