如何查询pyspark

时间:2016-07-21 22:55:13

标签: apache-spark dataframe pyspark apache-spark-sql

在这里,我将数据帧注册为临时表并尝试查询,但发生的事情超出了我的理解,我无法理解它。

order_transactions_step6_df.registerTempTable("order_transactions")

>>> sqlContext.sql('describe order_transactions')

DataFrame [col_name:string,data_type:string,comment:string]

>>> sqlContext.sql('select count(*) from order_transactions')

DataFrame [_c0:bigint]

>>> sqlContext.sql('select * from order_transactions limit 10')

DataFrame [C0:timestamp,C1:string,C2:string,C3:string,C4:int,C5:int,C6:int,C7:string,C8:double,C9:int,C10:string,C11 :int,C12:string,C13:string,C14:string,C15:int,C16:int,C17:timestamp,C18:string,C19:string,C20:string,C21:string,C22:string,C23:int ,C24:string,C25:double,C26:timestamp,C27:int,C28:string,C29:timestamp,C30:timestamp,C31:int,C32:int,C33:string,C34:double,C35:timestamp,C36 :int,C37:int,C38:string,C39:int,C40:string,C41:int,C42:timestamp,C43:int,C44:timestamp,C45:int,C46:int,C47:int,C48:int ,C49:int,C50:double,C51:double,C52:int,C53:string,C54:int,C55:int,C56:string,C57:string,C58:timestamp,C59:int,C60:string,C61 :int,C62:string,C63:int,C64:int,C65:double,C66:timestamp,C67:timestamp,C68:timestamp,C69:string,C70:string,C71:string,C72:int,C73:int ,C74:string,C75:string,C76:int,C77:int,C78:int,C79:string,C80:string,C81:st ring,C82:int,C83:int,C84:int,C85:int,C86:string,C87:int,C88:int,C89:string,C90:int,C91:string,C92:int,C93:int, C94:int,C95:int,C96:string,C97:int,C98:string,C99:int,C100:int,C101:string,C102:string,C103:string,C104:string,C105:int,C106: string,C107:int,C108:int,C109:string,C110:string,C111:string,C112:string,C113:string,C114:string,C115:string,C116:string,C117:string,C118:string, C119:string,C120:string,C121:string,C122:string,C123:int,C124:int,C125:string,C126:string,C127:string,C128:string,C129:string,C130:string,C131: string,C132:string,C133:string,C134:string,C135:string,C136:string,C137:string,C138:string,C139:boolean,C140:boolean,C141:boolean,C142:boolean,C143:string, C144:string,C145:string,C146:string,C147:string,C148:string,C149:string,C150:string,C151:string,C152:string,C153:string,C154:string,C155:string,C156: string,C157:string,C 158:string,C159:string,C160:string,C161:string,C162:string,C163:double,C164:string,C165:int,C166:string,C167:string,C168:string]

1 个答案:

答案 0 :(得分:1)

发生的事情是,当您执行sqlContext.sql('QUERY')时,此方法的返回值为dataFrame。您所看到的是数据框的对象表示。

尝试这样做:

result = sqlContext.sql('select * from order_transactions limit 10')
result.show(10)

这将返回dataFrame中的10个第一行。不是对象表示。