我有一个pandas数据框如下。
activity User_Id \
0 VIEWED MOVIE 158d292ec18a49
1 VIEWED MOVIE 158d292ec18a49
2 VIEWED MOVIE 158d292ec18a49
3 VIEWED MOVIE 158d292ec18a49
4 VIEWED MOVIE 158e00978d7a6c
Media_Title Media_Type User_Rating
0 20th Asian Athletics Championship-2013 Held At... NA
1 Tu Majha Saangaati NA
2 Home Cooking NA
3 Mix Dil Se NA
4 Value, Virtues, Ethics & Morality NA
我正在尝试使用pandasql的sqldf包编写SQL查询,如下所示。
distinct_activity_user = pandasql.sqldf(" select User_Id from pmm_activity", locals())
我得到的错误是:
OperationalError: (sqlite3.OperationalError) too many SQL variables [SQL: 'INSERT INTO pmm_activity (activity, "User_Id", "Media_Title", "Media_Type", "User_Rating") VALUES
答案 0 :(得分:0)
这可能是与列名中的空格有关的问题。当我尝试使用您提供的数据时,我遇到了这种情况。我有一个使用sqlite3
的示例。这是一个示例,可以解决您的问题:
import sqlite3 as sql
import pandas as pd
file = "..../movie.csv"
df = pd.read_csv(file, sep=";", dtype='unicode' )
这是数据框的样子
conn = sql.connect('movie2.db')
df.to_sql('movie', conn)
conn = sql.connect('movie2.db')
Movie = pd.read_sql('SELECT distinct "User_Id " FROM movie', conn)