我正在阅读像这样的CSV文件
Date,Open,High,Low,Close,Volume,Adj Close
2000-12-29,30.88,31.31,28.69,29.06,31702200,27.57
2000-12-28,30.56,31.62,30.38,31.06,25053600,29.46
2000-12-27,30.38,31.06,29.38,30.69,26437500,29.11
2000-12-26,31.50,32.19,30.00,30.94,20589500,29.34
2000-12-22,30.38,31.98,30.00,31.88,35568200,30.23
2000-12-21,27.81,30.25,27.31,29.50,46719700,27.98
2000-12-20,28.06,29.81,27.50,28.50,54440500,27.03
2000-12-19,31.81,33.12,30.12,30.62,58653700,29.05
...
2000-01-13,108.50,109.88,103.50,105.06,55779200,24.91
2000-01-12,112.25,112.25,103.69,105.62,83443600,25.05
2000-01-11,112.62,114.75,109.50,112.38,86585200,26.65
2000-01-10,108.00,116.00,105.50,115.75,91518000,27.45
2000-01-07,95.00,103.50,93.56,103.38,91755600,24.51
2000-01-06,100.16,105.00,94.69,96.00,109880000,22.76
2000-01-05,101.62,106.38,96.00,102.00,166054000,24.19
2000-01-04,115.50,118.62,105.00,107.69,116824800,25.54
2000-01-03,124.62,125.19,111.62,118.12,98114800,28.01
可以使用
下载完整数据python -c "from pyalgotrade.tools import yahoofinance; yahoofinance.download_daily_bars('orcl', 2000, 'orcl-2000.csv')"
请参阅http://gbeced.github.io/pyalgotrade/docs/v0.15/html/tutorial.html
我尝试使用Python,Pandas,SQLAlchemy,read_csv
和to_sql
将此CSV数据放入MySQL数据库:
filename = "orcl-2000.csv"
df = pd.read_csv(filename, sep=',')
db_uri = "mysql+mysqlconnector://{user}:{password}@{host}:{port}/{db}" # or without mysqlconnector (need MySQLdb)
db_uri = db_uri.format(
user = "root",
password = "123456",
host = "127.0.0.1",
db = "test",
port = 3306
)
engine = sqlalchemy.create_engine(db_uri)
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
print(df)
print(df.dtypes)
print(type(df.index), df.index.dtype)
print(type(df.index[0]))
df.to_sql("test_table", engine, flavor="mysql", if_exists="replace")
我得到以下输出:
$ python main.py
Date Open High Low Close Volume Adj Close
0 2000-12-29 30.88 31.31 28.69 29.06 31702200 27.57
1 2000-12-28 30.56 31.62 30.38 31.06 25053600 29.46
2 2000-12-27 30.38 31.06 29.38 30.69 26437500 29.11
3 2000-12-26 31.50 32.19 30.00 30.94 20589500 29.34
4 2000-12-22 30.38 31.98 30.00 31.88 35568200 30.23
5 2000-12-21 27.81 30.25 27.31 29.50 46719700 27.98
6 2000-12-20 28.06 29.81 27.50 28.50 54440500 27.03
7 2000-12-19 31.81 33.12 30.12 30.62 58653700 29.05
8 2000-12-18 30.00 32.44 29.94 32.00 61640100 30.35
9 2000-12-15 29.44 30.08 28.19 28.56 120004000 27.09
10 2000-12-14 29.25 29.94 27.25 27.50 45894400 26.08
11 2000-12-13 31.94 32.00 28.25 28.38 37933600 26.91
12 2000-12-12 31.88 32.50 30.41 30.75 26481200 29.17
13 2000-12-11 30.50 32.25 30.00 31.94 50279700 30.29
14 2000-12-08 30.06 30.62 29.25 30.06 40052600 28.51
15 2000-12-07 29.62 29.94 28.12 28.31 41088300 26.85
16 2000-12-06 31.19 31.62 29.31 30.19 42125600 28.63
17 2000-12-05 29.44 31.50 28.88 31.50 59754700 29.88
18 2000-12-04 26.25 28.88 26.19 28.19 40710400 26.74
19 2000-12-01 26.38 27.88 25.50 26.44 48663500 25.08
20 2000-11-30 21.75 27.62 21.50 26.50 84386200 25.14
21 2000-11-29 23.19 23.62 21.81 22.88 75409600 21.70
22 2000-11-28 23.50 23.81 22.25 22.66 43075300 21.49
23 2000-11-27 25.44 25.81 22.88 23.12 45665200 21.93
24 2000-11-24 23.31 24.25 23.12 24.12 22443900 22.88
25 2000-11-22 23.62 24.06 22.06 22.31 53315300 21.16
26 2000-11-21 24.81 25.62 23.50 23.88 58647400 22.65
27 2000-11-20 24.31 25.88 24.00 24.75 89778400 23.48
28 2000-11-17 26.94 29.25 25.25 28.81 59636000 27.33
29 2000-11-16 28.75 29.81 27.25 27.38 37986600 25.96
.. ... ... ... ... ... ... ...
222 2000-02-14 60.88 62.25 58.62 62.19 37599800 29.49
223 2000-02-11 62.50 64.75 58.75 59.69 55774000 28.31
224 2000-02-10 60.00 62.62 58.00 62.31 45288600 29.55
225 2000-02-09 60.06 61.31 58.81 59.94 52471600 28.43
226 2000-02-08 60.75 61.44 59.00 59.56 55718000 28.25
227 2000-02-07 59.31 60.00 58.88 59.94 44691200 28.43
228 2000-02-04 57.62 58.25 56.81 57.81 40916000 27.42
229 2000-02-03 55.38 57.00 54.25 56.69 55533200 26.88
230 2000-02-02 54.94 56.00 54.00 54.31 63933000 25.76
231 2000-02-01 51.25 54.31 50.00 54.00 57105600 25.61
232 2000-01-31 47.94 50.12 47.06 49.95 68148000 23.69
233 2000-01-28 51.50 51.94 46.62 47.38 86394000 22.47
234 2000-01-27 55.81 56.69 50.00 51.81 61054000 24.57
235 2000-01-26 56.75 58.94 55.00 55.06 47569200 26.11
236 2000-01-25 55.06 57.50 54.88 56.44 53059200 26.77
237 2000-01-24 60.25 60.38 54.00 54.19 50022400 25.70
238 2000-01-21 61.50 61.50 59.00 59.69 50891000 28.31
239 2000-01-20 59.00 60.25 58.12 59.25 54526800 28.10
240 2000-01-19 56.12 58.25 54.00 57.12 49198400 27.09
241 2000-01-18 107.88 114.50 105.62 111.25 66780000 26.38
242 2000-01-14 109.00 111.38 104.75 106.81 57078000 25.33
243 2000-01-13 108.50 109.88 103.50 105.06 55779200 24.91
244 2000-01-12 112.25 112.25 103.69 105.62 83443600 25.05
245 2000-01-11 112.62 114.75 109.50 112.38 86585200 26.65
246 2000-01-10 108.00 116.00 105.50 115.75 91518000 27.45
247 2000-01-07 95.00 103.50 93.56 103.38 91755600 24.51
248 2000-01-06 100.16 105.00 94.69 96.00 109880000 22.76
249 2000-01-05 101.62 106.38 96.00 102.00 166054000 24.19
250 2000-01-04 115.50 118.62 105.00 107.69 116824800 25.54
251 2000-01-03 124.62 125.19 111.62 118.12 98114800 28.01
[252 rows x 7 columns]
Date datetime64[ns]
Open float64
High float64
Low float64
Close float64
Volume int64
Adj Close float64
dtype: object
(<class 'pandas.tseries.index.DatetimeIndex'>, dtype('<M8[ns]'))
<class 'pandas.tslib.Timestamp'>
Traceback (most recent call last):
File "main.py", line 28, in <module>
main()
File "main.py", line 25, in main
df.to_sql("test_table", engine, flavor="mysql", if_exists="replace")
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 950, in to_sql
index_label=index_label)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 475, in to_sql
index_label=index_label)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 842, in to_sql
table.insert()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 611, in insert
self.pd_sql.execute(ins, data_list)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 810, in execute
return self.engine.execute(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1614, in execute
return connection.execute(statement, *multiparams, **params)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in execute
params)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
context)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1024, in _handle_dbapi_exception
exc_info
File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 196, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856, in _execute_context
context)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 321, in do_executemany
cursor.executemany(statement, parameters)
File "/usr/lib/python2.7/dist-packages/mysql/connector/cursor.py", line 557, in executemany
values.append(fmt % self._process_params(params))
File "/usr/lib/python2.7/dist-packages/mysql/connector/cursor.py", line 344, in _process_params
return self._process_params_dict(params)
File "/usr/lib/python2.7/dist-packages/mysql/connector/cursor.py", line 335, in _process_params_dict
"Failed processing pyformat-parameters; %s" % err)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) Failed processing pyformat-parameters; 'MySQLConverter' object has no attribute '_timestamp_to_mysql' u'INSERT INTO test_table (`index`, `Date`, `Open`, `High`, `Low`, `Close`, `Volume`, `Adj Close`) VALUES (%(index)s, %(Date)s, %(Open)s, %(High)s, %(Low)s, %(Close)s, %(Volume)s, %(Adj Close)s)' ({'index': 0, 'High': 31.31, 'Adj Close': 27.57, 'Volume': 31702200, 'Low': 28.69, 'Date': Timestamp('2000-12-29 00:00:00'), 'Close': 29.06, 'Open': 30.88}, {'index': 1, 'High': 31.62, 'Adj Close': 29.46, 'Volume': 25053600, 'Low': 30.38, 'Date': Timestamp('2000-12-28 00:00:00'), 'Close': 31.06, 'Open': 30.56}, {'index': 2, 'High': 31.06, 'Adj Close': 29.11, 'Volume': 26437500, 'Low': 29.38, 'Date': Timestamp('2000-12-27 00:00:00'), 'Close': 30.69, 'Open': 30.38}, {'index': 3, 'High': 32.19, 'Adj Close': 29.34, 'Volume': 20589500, 'Low': 30.0, 'Date': Timestamp('2000-12-26 00:00:00'), 'Close': 30.94, 'Open': 31.5}, {'index': 4, 'High': 31.98, 'Adj Close': 30.23, 'Volume': 35568200, 'Low': 30.0, 'Date': Timestamp('2000-12-22 00:00:00'), 'Close': 31.88, 'Open': 30.38}, {'index': 5, 'High': 30.25, 'Adj Close': 27.98, 'Volume': 46719700, 'Low': 27.31, 'Date': Timestamp('2000-12-21 00:00:00'), 'Close': 29.5, 'Open': 27.81}, {'index': 6, 'High': 29.81, 'Adj Close': 27.03, 'Volume': 54440500, 'Low': 27.5, 'Date': Timestamp('2000-12-20 00:00:00'), 'Close': 28.5, 'Open': 28.06}, {'index': 7, 'High': 33.12, 'Adj Close': 29.05, 'Volume': 58653700, 'Low': 30.12, 'Date': Timestamp('2000-12-19 00:00:00'), 'Close': 30.62, 'Open': 31.81} ... displaying 10 of 252 total bound parameter sets ... {'index': 250, 'High': 118.62, 'Adj Close': 25.54, 'Volume': 116824800, 'Low': 105.0, 'Date': Timestamp('2000-01-04 00:00:00'), 'Close': 107.69, 'Open': 115.5}, {'index': 251, 'High': 125.19, 'Adj Close': 28.01, 'Volume': 98114800, 'Low': 111.62, 'Date': Timestamp('2000-01-03 00:00:00'), 'Close': 118.12, 'Open': 124.62})
列Date
类型为datetime64[ns]
。
SQLAlchemy似乎不喜欢这种Numpy类型所以它引发了:
sqlalchemy.exc.ProgrammingError: (ProgrammingError) Failed processing pyformat-parameters; 'MySQLConverter' object has no attribute '_timestamp_to_mysql'
如何彻底避免这种错误?
答案 0 :(得分:2)
我对MySQL Connector不是很熟悉,但according to this,您应该可以使用类似
的内容添加datetime64转换器class Datetime64Converter(mysql.connector.conversion.MySQLConverter):
""" A mysql.connector Converter that handles datetime64 types """
def _timestamp_to_mysql(self, value):
return value.view('<i8')
config = {
'user' : 'user',
'host' : 'localhost',
'password': 'xxx',
'database': 'db1'}
conn = mysql.connector.connect(**config)
conn.set_converter_class(Datetime64Converter)
由于所有datetime64 dtypes都是8字节,因此可以查看它们并将其存储为8字节整数。我不确定MySQL Connector提供什么设施来将数据拉回datetime64
。但如果所有其他方法都失败了,您可以将8字节整数转换回datetime64[ns]
,如下所示:
In [33]: s.view('<i8')
Out[33]:
0 978307200000000000
1 978393600000000000
2 978480000000000000
3 978566400000000000
4 978652800000000000
5 978739200000000000
6 978825600000000000
7 978912000000000000
8 978998400000000000
9 979084800000000000
dtype: int64
In [34]: s.view('<i8').view('<M8[ns]')
Out[34]:
0 2001-01-01
1 2001-01-02
2 2001-01-03
3 2001-01-04
4 2001-01-05
5 2001-01-06
6 2001-01-07
7 2001-01-08
8 2001-01-09
9 2001-01-10
dtype: datetime64[ns]
答案 1 :(得分:1)
我发现只需重新格式化日期,因为字符串可以解决问题。
df['Date'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
MySQL仍然认为它是一个日期而不是VarChar。我在工作中遇到了完全相同的问题,这个解决方案一直在运行。