Dask Dataframe read_sql_table返回TypeError

时间:2017-11-26 17:34:01

标签: python mysql dataframe dask

尝试以下代码

alerts = df.read_sql_table('alerts', db_url, index_col='id', npartitions=16)

我收到以下错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-67-d14f44b5a2c5> in <module>()
----> 1 alerts = df.read_sql_table('alerts', db_url, index_col='id', npartitions=16)

/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/dask/dataframe/io/sql.pyc in read_sql_table(table, uri, index_col, divisions, npartitions, limits, columns, bytes_per_chunk, **kwargs)
121             divisions[-1] = maxi
122         else:
--> 123             divisions = np.linspace(mini, maxi, npartitions + 1).tolist()
124 
125     parts = []

/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/numpy/core/function_base.pyc in linspace(start, stop, num, endpoint, retstep, dtype)
106     # Convert float/complex array scalars to float, gh-3504
107     # and make sure one can use variables that have an __array_interface__, gh-6634
--> 108     start = asanyarray(start) * 1.0
109     stop  = asanyarray(stop)  * 1.0
110 

TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

数据库架构如下:

describe alerts;
+-------------------+----------+------+-----+---------+----------------+
| Field             | Type     | Null | Key | Default | Extra          |
+-------------------+----------+------+-----+---------+----------------+
| id                | int(11)  | NO   | PRI | NULL    | auto_increment |
| description       | text     | NO   |     | NULL    |                |
| channel_id        | int(11)  | NO   | MUL | NULL    |                |
| score             | float    | NO   |     | NULL    |                |
| raised_at         | datetime | YES  |     | NULL    |                |
| updated_at        | datetime | YES  |     | NULL    |                |
| activity_earliest | datetime | NO   |     | NULL    |                |
| activity_latest   | datetime | NO   |     | NULL    |                |
+-------------------+----------+------+-----+---------+----------------+
8 rows in set (0,00 sec)

我无法理解错误。表现在是空的。

1 个答案:

答案 0 :(得分:0)

问题似乎出在这一行here。如果未设置限制,则会尝试通过查询表的最大和最小索引来查找限制。但如果表中没有数据,则对于max和min都返回None。因此,当试图获取here np.linspace时,minimaxi

的无值会引发异常

使用read_sql_table时是否有处理空数据库表的机制?