我知道如何在pandas中以各种方式连接表 - concat,merge等,但我想知道如何使用pandasql这样做。具体来说,我想在索引上加入两个pandas数据帧。这可能吗?当我做的时候
new_df = pysqldf("SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.key=b.key;")
我得到了正确的结果。 (我在两个表上都有一个"键"变量。)但是,当我尝试
时new_df = pysqldf("SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.index=b.index;")
我得到了
---------------------------------------------------------------------------
PandaSQLException Traceback (most recent call last)
<ipython-input-154-ecab230d4dc9> in <module>()
----> 1 new_df = pysqldf("SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.index=b.index;")
<ipython-input-100-adc122e97ed8> in <lambda>(q)
1 from pandasql import sqldf
----> 2 pysqldf = lambda q: sqldf(q, globals())
/Users/jwesley/anaconda/lib/python2.7/site-packages/pandasql/sqldf.pyc in sqldf(query, env, db_uri)
154 >>> sqldf("select avg(x) from df;", locals())
155 """
--> 156 return PandaSQL(db_uri)(query, env)
/Users/jwesley/anaconda/lib/python2.7/site-packages/pandasql/sqldf.pyc in __call__(self, query, env)
61 result = read_sql(query, conn)
62 except DatabaseError as ex:
---> 63 raise PandaSQLException(ex)
64 except ResourceClosedError:
65 # query returns nothing
PandaSQLException: (sqlite3.OperationalError) near "index": syntax error [SQL: 'SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.index=b.index;']
答案 0 :(得分:0)
只需命名索引df1.index.rename('foo', inplace=True)
,然后就可以在sql查询中引用名为'foo'
的列的索引。
那是因为pandasql将检查是否设置了索引名称:
def write_table(df, tablename, conn): """ Write a dataframe to the database. """ with catch_warnings(): filterwarnings('ignore', message='The provided table name \'%s\' is not found exactly as such in the database' % tablename) to_sql(df, name=tablename, con=conn, index=not any(name is None for name in df.index.names)) # load index into db if all levels are named
注意:我已尝试将索引重命名为&#39; index&#39;并且查询失败。但它成功与其他索引名称设置。也许&#39;索引&#39;是keyword in SQLite?
或者您可以添加与索引相同的新列:df1['index'] = df1.index
。