这是我的代码:
from pyspark import SparkContext
import os
import string
from pyspark.sql import *
from pyspark.sql.types import *
sc = SparkContext()
sqlCtx = SQLContext(sc)
dir = os.path.dirname(__file__)
# get header to create schema
with open(dir+"/data.csv") as fi:
header = fi.readline().strip()
header = header.split(",")
print(header)
# create the schema StructType
gex_fields = [StructField(field, StringType()) for field in header[:2]]
gex_fields = gex_fields + [StructField(field, DoubleType()) for field in header[2:]]
print(gex_fields)
gex_schema=StructType(gex_fields)
# import the csv file
gex = sqlCtx.read.csv("file:"+dir+"/data.csv", header=True, mode="DROPMALFORMED", schema=gex_schema)
print(gex.show())
问题是当我拨打print(gex.show())
时,我会在桌子后打印出一张None
。是什么造成的?如何删除它?
$ spark-submit main.py
17/04/14 23:16:19 WARN Utils: Your hostname, Pandora resolves to a loopback address: 127.0.1.1; using 192.168.1.11 instead (on interface wlp3s0)
17/04/14 23:16:19 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/04/14 23:16:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
['PATIENT_ID', 'DIAGNOSIS', '1', '2', '3', '4']
[StructField(PATIENT_ID,StringType,true), StructField(DIAGNOSIS,StringType,true), StructField(1,DoubleType,true), StructField(2,DoubleType,true), StructField(3,DoubleType,true), StructField(4,DoubleType,true)]
+-----------+---------+---+---+---+---+
| PATIENT_ID|DIAGNOSIS| 1| 2| 3| 4|
+-----------+---------+---+---+---+---+
|X764_130520| NA|1.0|2.0|3.0|1.0|
|X800_130701| NA|4.0|5.0|6.0|1.0|
|X218_120425| 1|7.0|8.0|9.0|1.0|
+-----------+---------+---+---+---+---+
None
答案 0 :(得分:2)
只需执行gex.show()
而不是print(gex.show())。 .show()
本身打印前20个记录并返回无,这就是为什么当你在它上面进行adiitional打印时,它打印无