我在使用pandas DataFrame时遇到了一些问题。我希望有人能帮助我。我从cryptocompare下载了一些数据并将其写入cvs文件。我的目标是每天不断更新此cvs文件。
在额外的DataFrame中下载新数据后,我想将其与现有数据合并。因此,我编写了一个函数(read_dataset),用于读取DataFrame中cvs文件的现有数据。下一步应该将新数据与现有数据合并。我尝试了pd.merge和pd.concate,但这一切都不起作用。
我的DataFrames看起来像:
open time volumefrom volumeto Timestamp
0 0.04951 1279324800 20.00 9.902000e-01 2010-07-17
1 0.04951 1279411200 75.01 5.090000e+00 2010-07-18
2 0.08584 1279497600 574.00 4.966000e+01 2010-07-19
3 0.08080 1279584000 262.00 2.059000e+01 2010-07-20
4 0.07474 1279670400 575.00 4.226000e+01 2010-07-21
5 0.07921 1279756800 2160.00 1.297800e+02 2010-07-22
6 0.05050 1279843200 2402.50 1.410700e+02 2010-07-23
7 0.06262 1279929600 496.32 2.673000e+01 2010-07-24
8 0.05454 1280016000 1551.48 8.506000e+01 2010-07-25
9 0.05050 1280102400 877.00 4.691000e+01 2010-07-26
10 0.05600 1280188800 3373.69 1.969200e+02 2010-07-27
11 0.06000 1280275200 4390.29 2.557600e+02 2010-07-28
12 0.05890 1280361600 8058.49 5.283200e+02 2010-07-29
13 0.06990 1280448000 3020.85 1.985300e+02 2010-07-30
14 0.06270 1280534400 4022.25 2.439000e+02 2010-07-31
15 0.06785 1280620800 2601.00 1.626500e+02 2010-08-01
16 0.06110 1280707200 3599.00 2.212000e+02 2010-08-02
17 0.06000 1280793600 9821.46 6.060500e+02 2010-08-03
18 0.06000 1280880000 3494.00 2.107700e+02 2010-08-04
19 0.05700 1280966400 5034.07 3.036100e+02 2010-08-05
20 0.06100 1281052800 1395.00 8.591000e+01 2010-08-06
21 0.06230 1281139200 2619.00 1.573400e+02 2010-08-07
22 0.05900 1281225600 2201.00 1.326000e+02 2010-08-08
23 0.06090 1281312000 13631.09 8.869300e+02 2010-08-09
24 0.07100 1281398400 1310.39 8.887000e+01 2010-08-10
25 0.07000 1281484800 14061.18 1.015640e+03 2010-08-11
26 0.06700 1281571200 2062.31 1.344900e+02 2010-08-12
27 0.07000 1281657600 3591.77 2.338000e+02 2010-08-13
28 0.06450 1281744000 4404.20 2.953100e+02 2010-08-14
29 0.06700 1281830400 4462.87 2.949500e+02 2010-08-15
... ... ... ... ...
2791 9928.56000 1520467200 154879.22 1.492236e+09 2018-03-08
2792 9316.77000 1520553600 233598.15 2.081621e+09 2018-03-09
2793 9252.76000 1520640000 117409.38 1.084926e+09 2018-03-10
2794 8797.27000 1520726400 149877.66 1.374815e+09 2018-03-11
2795 9543.98000 1520812800 152959.80 1.435404e+09 2018-03-12
2796 9142.27000 1520899200 133768.47 1.228556e+09 2018-03-13
2797 9160.12000 1520985600 161775.05 1.385573e+09 2018-03-14
2798 8216.22000 1521072000 187365.71 1.519850e+09 2018-03-15
2799 8267.95000 1521158400 129688.11 1.082790e+09 2018-03-16
2800 8283.23000 1521244800 111641.32 9.019394e+08 2018-03-17
2801 7882.67000 1521331200 198796.34 1.535519e+09 2018-03-18
2802 8215.50000 1521417600 171829.52 1.447813e+09 2018-03-19
2803 8623.14000 1521504000 131959.66 1.150462e+09 2018-03-20
2804 8920.53000 1521590400 109985.22 9.913764e+08 2018-03-21
2805 8911.37000 1521676800 116522.98 1.023287e+09 2018-03-22
2806 8724.98000 1521763200 109649.39 9.399973e+08 2018-03-23
2807 8935.51000 1521849600 93296.24 8.276632e+08 2018-03-24
2808 8548.39000 1521936000 76775.64 6.576435e+08 2018-03-25
2809 8472.56000 1522022400 131859.97 1.079039e+09 2018-03-26
2810 8152.18000 1522108800 116523.10 9.307550e+08 2018-03-27
2811 7808.42000 1522195200 82590.62 6.577121e+08 2018-03-28
2812 7959.78000 1522281600 185805.88 1.379180e+09 2018-03-29
2813 7106.62000 1522368000 229837.79 1.584675e+09 2018-03-30
2814 6853.75000 1522454400 129526.48 9.154006e+08 2018-03-31
2815 6943.77000 1522540800 131344.01 8.898877e+08 2018-04-01
2816 6835.58000 1522627200 106513.22 7.488614e+08 2018-04-02
2817 7074.65000 1522713600 122807.02 9.053268e+08 2018-04-03
2818 7434.30000 1522800000 123910.33 8.771998e+08 2018-04-04
2819 6815.50000 1522886400 114426.84 7.771452e+08 2018-04-05
2820 6790.45000 1522972800 72568.93 4.848647e+08 2018-04-06
现有的和新的DataFrame应该在关键时间'上合并,这是一个unix时间戳。
# Read the old data
df_old = read_dataset('BTC_historical_data_daily')
# Download the new data
df_new = download_historical_data('BTC', 'USD', 'CCCAGG', 'day')
# Merge the two DataFrames on 'time'
df_merged_inner = pd.merge(left=df_old, right=df_new, how='left', left_on='time', right_on='time')
# Convert Unix Timestamp into a readable format
df_merged_inner['Timestamp'] = pd.to_datetime(df_merged_inner['time'], unit='s')
# Drop the Unix Timestamp
df_merged_inner = df_merged_inner.drop('time', axis=1)
# Save the new DataFrame as cvs file
df_merged_inner.to_csv('BTC_historical_data_daily_' + current_datetime)
此代码返回一个没有更新数据的DataFrame,但每个键的值都加倍。
pd.concate回复了以下错误:
d = pd.concat(df_old,df_new)
Traceback (most recent call last):
File "/Users/audiodeep/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-50-891cefa897e1>", line 1, in <module>
d = pd.concat(df_old,df_new)
File "/Users/audiodeep/anaconda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 212, in concat
copy=copy)
File "/Users/audiodeep/anaconda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 227, in __init__
'"{name}"'.format(name=type(objs).__name__))
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
有没有人帮我解决?非常感谢:D
答案 0 :(得分:0)
pd.concat([df_old, df_new])
错误消息基本上是您的DataFrames组必须位于可迭代对象中;列表。
答案 1 :(得分:0)
正如czr在评论中提到的那样,pd.concat
在您为其提供元组(df_old, df_new)
时应该适用于您的示例。这是因为它期望可迭代,例如元组或列表。
您提供df_old和df_new的方式不起作用,因为您将每个提供为单独的位置参数,即pd.concat(df_old, df_new)
。
以下任何一项都应该有效:
d = pd.concat((df_old, df_new))
d = pd.concat([df_old, df_new])
官方documentation提到这个可迭代为objs。
此外,您可能想要考虑为您拥有多行的时间点仅保留一个数据点。您可以通过以下方式执行此操作:
d = d.drop_duplicates('time')