Question

我有一个名为python的{{1}}文件。在这个文件中，我将执行一些test.py命令。

pyspark

在#!/usr/bin/env python import sys from pyspark import SparkContext, SparkConf from pyspark.sql import HiveContext conf = SparkConf() sc = SparkContext(conf=conf) sqlContext = HiveContext(sc) # create a data frame from hive tables df=sqlContext.table("testing.test") # register the data frame as temp table df.registerTempTable('mytempTable') # find number of records in data frame records = df.count() print "records='%s'" %records if records < 1000000: sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable".format(hivedb,table)) else: sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable where id <= 1000000".format(hivedb,table)) sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 1000000 and id <= 2000000".format(hivedb,table)) sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 2000000 and id <= 3000000".format(hivedb,table)) sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 3000000 and id <= 4000000".format(hivedb,table)) sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 4000000 and id <= 5000000".format(hivedb,table)) and so on till the last million之后的if-else语句中我手动编写的代码。

我想自动在脚本中生成这部分代码。

如何生成类似的代码行，直到else中的最后一百万？

Answer 1

您可以使用简单的循环：

fmt = "insert into table {hivedb}.{table} select * from mytempTable where id > {low} and id <= {hi}"
for low in range(100000, 1000000, 100000):
    stmt = fmt.format(low=low, hi=low+100000, hivedb=hivedb, table=table)
    sqlContext.sql(stmt)

如何在python脚本中自动生成代码行

1 个答案: