pyspark错误:TypeError:图像数据无法转换为浮点数

时间:2019-07-18 19:32:45

标签: python-3.x apache-spark matplotlib rdd imshow

我正在使用pyspark中的rdd,我的代码是

x = sort(rnorm(30)) # x is the feature variable 
px = stats::poly(x, degree = 3) # orthogonal polynomial basis

smooth_spline_basis1 = smooth.spline(x, px[,1],df=3, all.knots = TRUE)$y 
smooth_spline_basis2 = smooth.spline(x, px[,2],df=3, all.knots = TRUE)$y 
smooth_spline_basis3 = smooth.spline(x, px[,3],df=3, all.knots = TRUE)$y 

par(mfrow=c(2,2))
plot(px[,1],smooth_spline_basis1, main = "smoothing_spline_basis1 VS polynomial_spline_basis1")
plot(px[,2],smooth_spline_basis2, main = "smoothing_spline_basis2 VS polynomial_spline_basis2")
plot(px[,3],smooth_spline_basis3, main = "smoothing_spline_basis3 VS polynomial_spline_basis3")
par(mfrow=c(1,1))

但是我得到这个错误: TypeError:图像数据无法转换为浮动

我访问了关于stackoverflow的类似问题的页面,但它们与rdd无关。我知道我应该尝试将rdd转换为浮点类型,但没有找到rdds的“ dtype”方法。我的代码曾经使用过python 2,但是由于我迁移到了python 3,所以抛出了这样的错误! 我试过了:

import matplotlib.pyplot as plt
import matplotlib.cm as cm

sampleMorePoints = rawData.take(50)
parsedSampleMorePoints = map(parsePoint, sampleMorePoints) 
dataValues = map(lambda lp: lp.features.toArray(),parsedSampleMorePoints)
def preparePlot(xticks, yticks, figsize=(10.5, 6), hideLabels=False, gridColor='#999999',
                gridWidth=1.0):
    """Template for generating the plot layout."""
    plt.close()
    fig, ax = plt.subplots(figsize=figsize, facecolor='white', edgecolor='white')
    ax.axes.tick_params(labelcolor='#999999', labelsize='10')
    for axis, ticks in [(ax.get_xaxis(), xticks), (ax.get_yaxis(), yticks)]:
        axis.set_ticks_position('none')
        axis.set_ticks(ticks)
        axis.label.set_color('#999999')
        if hideLabels: axis.set_ticklabels([])
    plt.grid(color=gridColor, linewidth=gridWidth, linestyle='-')
    map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])
    return fig, ax

# generate layout and plot
fig, ax = preparePlot(np.arange(.5, 11, 1), np.arange(.5, 49, 1), figsize=(8,7), hideLabels=True,
                      gridColor='#eeeeee', gridWidth=1.1)
image = plt.imshow(dataValues,interpolation='nearest', aspect='auto', cmap=cm.Greys)
for x, y, s in zip(np.arange(-.125, 12, 1), np.repeat(-.75, 12), [str(x) for x in range(12)]):
    plt.text(x, y, s, color='#999999', size='10')
plt.text(4.7, -3, 'Feature', color='#999999', size='11'), ax.set_ylabel('Observation')
display(fig) 
pass

但是没有用

*我的rawData像这样:

dataValues = map(lambda lp: float(lp.features.toArray()),parsedSampleMorePoints)

函数['2001.0,0.884123733793,0.610454259079,0.600498416968,0.474669212493,0.247232680947,0.357306088914,0.344136412234,0.339641227335,0.600858840135,0.425704689024,0.60491501652,0.419193351817'] 还将逗号分隔的unicode字符串转换为parsePoint,并定义为

LabeledPoint

更新:我也尝试过此方法,但没有帮助:

def parsePoint(line):
    """Converts a comma separated unicode string into a `LabeledPoint`.

    Args:
        line (unicode): Comma separated unicode string where the first element is the label and the
            remaining 12 elements are features.

    Returns:
        LabeledPoint: The line is converted into a `LabeledPoint`, which consists of a label and
            features. where features is a list.
    """
    splitted = str.split(line, ',')
    features = splitted[1:]
    label = splitted[0]
    return LabeledPoint(label, features)

0 个答案:

没有答案