Question

这是我的代码：

from pyspark import SparkContext
import os
import string

from pyspark.sql import *
from pyspark.sql.types import *

sc = SparkContext()
sqlCtx = SQLContext(sc)

dir = os.path.dirname(__file__)

# get header to create schema
with open(dir+"/data.csv") as fi:
    header = fi.readline().strip()
    header = header.split(",")
print(header)

# create the schema StructType
gex_fields = [StructField(field, StringType()) for field in header[:2]]
gex_fields = gex_fields + [StructField(field, DoubleType()) for field in header[2:]]
print(gex_fields)
gex_schema=StructType(gex_fields)

# import the csv file
gex = sqlCtx.read.csv("file:"+dir+"/data.csv", header=True, mode="DROPMALFORMED", schema=gex_schema)
print(gex.show())

问题是当我拨打print(gex.show())时，我会在桌子后打印出一张None。是什么造成的？如何删除它？

$ spark-submit main.py
17/04/14 23:16:19 WARN Utils: Your hostname, Pandora resolves to a loopback address: 127.0.1.1; using 192.168.1.11 instead (on interface wlp3s0)
17/04/14 23:16:19 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/04/14 23:16:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

['PATIENT_ID', 'DIAGNOSIS', '1', '2', '3', '4']
[StructField(PATIENT_ID,StringType,true), StructField(DIAGNOSIS,StringType,true), StructField(1,DoubleType,true), StructField(2,DoubleType,true), StructField(3,DoubleType,true), StructField(4,DoubleType,true)]
+-----------+---------+---+---+---+---+
| PATIENT_ID|DIAGNOSIS|  1|  2|  3|  4|
+-----------+---------+---+---+---+---+
|X764_130520|       NA|1.0|2.0|3.0|1.0|
|X800_130701|       NA|4.0|5.0|6.0|1.0|
|X218_120425|        1|7.0|8.0|9.0|1.0|
+-----------+---------+---+---+---+---+

None

Answer 1

只需执行gex.show()而不是print（gex.show（））。 .show()本身打印前20个记录并返回无，这就是为什么当你在它上面进行adiitional打印时，它打印无

Pyspark 2.1.0 SQLcontext show（）方法在表后打印怪异无

1 个答案: