我正在编写一个连接到Teradata DB的脚本,从单个表中读取数据,并在该表上运行一些分析。
我在下面的脚本(这个问题的通用)在大多数情况下工作正常,但我有2个问题......
与显示的2个NULL检查相同
由于
编辑弄清楚问题1.我刚补充说:
startcol=0
或
startcol=1
并将它们添加到同一张表中
import teradata
import pandas as pd
def main():
writer = pd.ExcelWriter('table_results.xlsx', engine='xlsxwriter')
udaExec = teradata.UdaExec(appName="test", version="1.0", logConsole=True)
def func_1():
#connect to Teradata and run SELECT statement on single table
with udaExec.connect(method="odbc", dsn="xxx", username="xxx", password="xxx") as session:
query = "Select * from TableA"
# read in records
df = pd.read_sql(query, session)
# print top 20 records
head = df.head(20)
head.to_excel(writer, sheet_name='Top_20')
# columns with NULL values -- returns True/False
null_columns = df.isnull().any()
null_columns.to_frame(name='HasNullValues').to_excel(writer, sheet_name='Null_Columns')
# count of NULL values per column
null_columns_sum = df.isnull().sum()
null_columns_sum.to_frame(name='NumNullValues').to_excel(writer, sheet_name='Null_Column_Count')
# max value per numeric column
max_val = df.max(numeric_only=True)
max_val.to_frame(name='max').to_excel(writer, sheet_name='Max_Val')
# min value per numeric column
min_val = df.min(numeric_only=True)
min_val.to_frame(name='max').to_excel(writer, sheet_name='Min_Val')
# count of records -- how to export this to the excel file as it's own tab? --this errors out
record_count = df.shape[0]
record_count.to_excel(writer, sheet_name='Count')
writer.close()
func_1()
if __name__ == "__main__":
main()
答案 0 :(得分:3)
对于第一个,您应该创建一个包含min和max的新数据框,复制索引(如果需要):
min_max_df = pd.DataFrame(index=df.index)
min_max_df["min"] = df.min(numeric_only=True)
min_max_df["max"] = df.max(numeric_only=True)
如果使用startrow
参数:
max_val = df.max(numeric_only=True)
max_val.to_frame(name='max').to_excel(writer, sheet_name='Min_Max')
min_val = df.min(numeric_only=True)
min_val.to_frame(name='max').to_excel(writer, sheet_name='Min_Max', startrow=df.shape[0] + 3)
其他几个例子在这篇非常好的文档中: http://xlsxwriter.readthedocs.io/working_with_pandas.html
对于第二个,作者是XlsxWriter
对象,因此您可以更新所需单元格的工作表。这应该工作(未经测试):
min_max_sheet = writer.get_worksheet_by_name("Min_Max")
min_max_sheet.write(df.shape[0] * 2 + 5, "{} rows".format(df.shape[0]))