Question

我正在使用磁盘上的SQLite3数据库，并使用sqlite3包从我的Python代码中查询它。另外，我使用pandas运行查询并将查询结果作为DataFrame返回，可以很好地打印，轻松探索等等。这是我的代码：

conn = sqlite3.connect(db_name) # @UndefinedVariable

results_df = pd.read_sql_query("SELECT * FROM nodes_tags LIMIT 10;", conn)    
print(results_df)

conn.close()

出于某种原因（并且仅针对我的一些数据库表，而不是所有数据库表），我回来的DataFrame包括列标题作为第一行数据，如下所示：

         id     key             value     type
0        id     key             value     type
1  75411942  source  tiger:boundaries  regular
2  75411946  source  tiger:boundaries  regular

有关为何会发生这种情况的任何想法？这些SQLite3表是使用从CSV文件导入的数据生成的（每个表一个）。当我在终端中运行head table_name.csv时，表格的标题在查询时不会返回额外的行，并且返回额外行的表格的标题看起来类似格式化，所以我不会认为源标题数据应该归咎于（可能）。

**** **** EDIT

另外，我只是看了sqlite3环境中表格的开头，问题表格的第一行也有重复的标题信息，但我还是不确定是怎么回事。

我用来创建原始CSV文件的代码是：

nodes_tags = []
nodes_tags.append([id_value, key_value, value_value, type_value])
#Does this many, many times

#for data = nodes_tags, nodes_tags is a list of lists
nodes_tags_df = pd.DataFrame(data = nodes_tags,
                             columns=['id', 'key', 'value', 'type'])
nodes_tags_df.drop_duplicates(inplace=True)
nodes_tags_df.to_csv('../CSV for SQL Tables/nodes_tags.csv', index=False, encoding='utf-8')

Answer 1

pd.read_sql_query() couldn't add this extra row - it's in the SQLite DB table.

So you would need to check how and what are you writing to SQLite DB.

Most probably you've used header=None argument when you parsed your CSV file(s).

Demo:

In [56]: df = pd.read_csv(filename, header=None)

In [57]: df
Out[57]:
          0       1                 2        3
0        id     key             value     type
1  75411942  source  tiger:boundaries  regular
2  75411946  source  tiger:boundaries  regular

In [58]: df.columns = ['id', 'key', 'value', 'type']

In [59]: df
Out[59]:
         id     key             value     type
0        id     key             value     type
1  75411942  source  tiger:boundaries  regular
2  75411946  source  tiger:boundaries  regular

Workaround:

In [60]: df = pd.read_csv(filename)

In [61]: df
Out[61]:
         id     key             value     type
0  75411942  source  tiger:boundaries  regular
1  75411946  source  tiger:boundaries  regular

Answer 2

感谢您的帮助。最后，事实证明问题出在我的SQLite3导入CSV文件本身。显然，由于我已经构建了表（包括列标题标题），因此从CSV文件导入会将标题行解释为数据行。对于一些表，它没有这样做，因为模式不允许将标题标题作为数据插入所需的数据类型，因此，对于那些，标题行被跳过。

pandas read_sql_query将标题信息放在DataFrame的第一行

2 个答案: