我正在尝试使用Pandas合并两个不同的csv文件,但这样做会遇到错误。
第一个文件是aapl.csv,如下所示:
Date Close High Low Open Volume
Symbol
AAPL 2017-05-25 153.87 154.3500 153.0300 153.7300 19235598
AAPL 2017-05-26 153.61 154.2400 153.3100 154.0000 21927637
第二个文件是corr_column.csv,如下所示:
Corr
0.01
0.02
我想以'Corr'在'Volume'之后显示为一列的方式合并它们。
我已尝试使用pd.concat,如文档中所述:
https://pandas.pydata.org/pandas-docs/stable/merging.html
这是我的代码:
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader.data as web
from mpl_finance import candlestick_ohlc
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter, MonthLocator, YearLocator, DayLocator
style.use( 'ggplot' )
##start = dt.datetime( 2017, 5, 29 )
##end = dt.datetime( 2018, 5, 29 )
##
##
##df = web.DataReader( AAPL, 'morningstar', start, end )
##
##df.to_csv( aapl.csv )
df = pd.read_csv( '/Users/zubairjohal/Documents/aapl.csv' , parse_dates=True, index_col=0 )
df_ohlc = df
corr_data = pd.read_csv( '/Users/zubairjohal/Documents/corr_column.csv', parse_dates=True, index_col=0 )
corr_data.dropna( inplace=True )
df.dropna( inplace=True )
merged = pd.concat( [ df, corr_data ], axis=1 )
merged.to_csv( 'combine2.csv', index=False )
print( merged )
但是,在打印时,我遇到了错误,如下所示:
Traceback (most recent call last):
File "/Users/zubairjohal/Documents/nw5.py", line 34, in <module>
merged = pd.concat( [ df, corr_data ], axis=1 )
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 226, in concat
return op.get_result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 423, in get_result
copy=self.copy)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site- packages/pandas/core/internals.py", line 5425, in concatenate_block_managers
return BlockManager(blocks, axes)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/internals.py", line 3282, in __init__
self._verify_integrity()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/internals.py", line 3493, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/internals.py", line 4843, in construction_error
passed, implied))
ValueError: Shape of passed values is (6, 68896), indices imply (6, 514)
任何建议,参考或替代方案都将不胜感激。
答案 0 :(得分:0)
你可以试试这个:
pd.concat([df_ohlc.reset_index(), corr_data], axis=1).set_index("Symbol")
输出:
Close Date High Low Open Volume Corr
Symbol
AAPL 153.87 2017-05-25 154.35 153.03 153.73 19235598.0 0.01
AAPL 153.61 2017-05-26 154.24 153.31 154.00 21927637.0 0.02
如果您的数据框与AAPL作为索引并且没有索引的corr打印df_ohlc
的方式相同,则此方法有效。