我正在尝试构建一个桌面应用程序,以通过熊猫数据框从excel生成sql查询。我能够生成insert语句,但是我正在接收time_stamp格式的数据,我想将其转换为to_date格式,请提出一种更好的方法来做到这一点。还建议使用相同的代码生成select语句
这是我的代码:
from pandas import *
table_name="ADI"
file_name=pandas.read_excel('supermarke.xlsx')
def SQL_Insert(SOURCE, TARGET):
sql_texts = []
for index, row in SOURCE.iterrows():
sql_texts.append(
'INSERT INTO ' + TARGET + ' (' + str(', '.join(SOURCE.columns)) + ') VALUES ' + str(tuple(row.values))+";")
return ('\n'.join(sql_texts))
print(SQL_Insert(file_name, table_name))
这是我的结果:-
INSERT INTO ADI (ID, Address, City, State, Country, Supermarket Name, Number of Employees, DATE) VALUES (1, '3666 21st St', 'San Francisco', 'CA 94114', 'USA', 'Madeira', 8, Timestamp('2018-01-12 00:00:00'));
INSERT INTO ADI (ID, Address, City, State, Country, Supermarket Name, Number of Employees, DATE) VALUES (2, '735 Dolores St', 'San Francisco', 'CA 94119', 'USA', 'Bready Shop', 15, Timestamp('2018-01-12 00:00:00'));
INSERT INTO ADI (ID, Address, City, State, Country, Supermarket Name, Number of Employees, DATE) VALUES (3, '332 Hill St', 'San Francisco', 'California 94114', 'USA', 'Super River', 25, Timestamp('2018-01-12 00:00:00'));
INSERT INTO ADI (ID, Address, City, State, Country, Supermarket Name, Number of Employees, DATE) VALUES (4, '3995 23rd St', 'San Francisco', 'CA 94114', 'USA', "Ben's Shop", 10, Timestamp('2018-01-12 00:00:00'));
,并且我尝试添加其他功能,如果找不到文件则显示错误消息。
excel文件
@Chirag,如果我的单元格值为空,我将收到类似nan的输出,但是当我要插入此内容时,由于sql使用null而不是nan,因此无法插入它。 / p>
INSERT INTO ADI (PLAN_ID, DEVICE_ID, PLAN_CONTRACT_DURATION, DEVICE_CONTRACT_DURATION, SALES_CHANNEL, VENDOR_TYPE, EFFECTIVE_DATE, EXPIRATION_DATE, PLAN_NAME, DEVICE, RRP, DEVICE_REPAYMENT, TOTAL_REPAYMENT_CHARGES, TOTAL_CREDIT_CHARGES, URL, EX_VENDOR_TYPE) VALUES (20637411, 20663271, 1, 1, 'ALL', 'ALL', Timestamp('2018-10-30 00:00:00'), Timestamp('2050-12-31 00:00:00'), 'Unlimited data Home Wireless ($79 Vividwireless)', 'Huawei B315 ', 0, 0, 199, 0, nan, nan);
如何用NULL / null代替nan?
答案 0 :(得分:1)
像这样吗?
import os
import pandas as pd
import numpy as np
def SQL_Insert(SOURCE, TARGET):
sql_texts = []
for index, row in SOURCE.iterrows():
sql_texts.append(
'INSERT INTO ' + TARGET + ' (' + str(', '.join(SOURCE.columns)) + ') VALUES ' + str(tuple(row.values))+";")
return ('\n'.join(sql_texts))
# check if file exists
if os.path.isfile("demo.xlsx"):
# reading file
df = pd.read_excel('demo.xlsx')
# casting to date as you mentioned
df["DATE"] = df.DATE.dt.strftime('%Y-%m-%d')
# replacin nan with None
df = df.astype('object').where(pd.notnull(df),None)
# generating create table statement, in case if you want to use
print(pd.io.sql.get_schema(df.reset_index(), 'table_name'))
# calling your function
q = SQL_Insert(df, "table_name")
print(q)
else:
print("File not found")
输出:
CREATE TABLE "table_name" (
"index" INTEGER,
"ID" INTEGER,
"Address" TEXT,
"City" TEXT,
"Country" TEXT,
"Supermarket Name" TEXT,
"Number of Employees" REAL,
"DATE" TEXT
)
INSERT INTO table_name (ID, Address, City, Country, Supermarket Name, Number of Employees, DATE) VALUES (1, 'Address 1', 'San Francisco', 'USA', 'Maderia', 8.0, '2018-01-12');
INSERT INTO table_name (ID, Address, City, Country, Supermarket Name, Number of Employees, DATE) VALUES (2, 'Address 2', 'San Francisco', 'USA', 'Brady Shop', 15.0, '2018-01-12');
INSERT INTO table_name (ID, Address, City, Country, Supermarket Name, Number of Employees, DATE) VALUES (3, 'Address 3', 'San Francisco', 'USA', 'Super River', 25.0, '2018-01-12');
INSERT INTO table_name (ID, Address, City, Country, Supermarket Name, Number of Employees, DATE) VALUES (4, 'Address 4', 'San Francisco', 'USA', "Ben's shop", 10.0, '2018-01-12');
INSERT INTO table_name (ID, Address, City, Country, Supermarket Name, Number of Employees, DATE) VALUES (5, None, 'San Francisco', None, "Ben's shop", None, 'NaT');