我正在使用pyspark中的rdd,我的代码是
x = sort(rnorm(30)) # x is the feature variable
px = stats::poly(x, degree = 3) # orthogonal polynomial basis
smooth_spline_basis1 = smooth.spline(x, px[,1],df=3, all.knots = TRUE)$y
smooth_spline_basis2 = smooth.spline(x, px[,2],df=3, all.knots = TRUE)$y
smooth_spline_basis3 = smooth.spline(x, px[,3],df=3, all.knots = TRUE)$y
par(mfrow=c(2,2))
plot(px[,1],smooth_spline_basis1, main = "smoothing_spline_basis1 VS polynomial_spline_basis1")
plot(px[,2],smooth_spline_basis2, main = "smoothing_spline_basis2 VS polynomial_spline_basis2")
plot(px[,3],smooth_spline_basis3, main = "smoothing_spline_basis3 VS polynomial_spline_basis3")
par(mfrow=c(1,1))
但是我得到这个错误: TypeError:图像数据无法转换为浮动
我访问了关于stackoverflow的类似问题的页面,但它们与rdd无关。我知道我应该尝试将rdd转换为浮点类型,但没有找到rdds的“ dtype”方法。我的代码曾经使用过python 2,但是由于我迁移到了python 3,所以抛出了这样的错误! 我试过了:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
sampleMorePoints = rawData.take(50)
parsedSampleMorePoints = map(parsePoint, sampleMorePoints)
dataValues = map(lambda lp: lp.features.toArray(),parsedSampleMorePoints)
def preparePlot(xticks, yticks, figsize=(10.5, 6), hideLabels=False, gridColor='#999999',
gridWidth=1.0):
"""Template for generating the plot layout."""
plt.close()
fig, ax = plt.subplots(figsize=figsize, facecolor='white', edgecolor='white')
ax.axes.tick_params(labelcolor='#999999', labelsize='10')
for axis, ticks in [(ax.get_xaxis(), xticks), (ax.get_yaxis(), yticks)]:
axis.set_ticks_position('none')
axis.set_ticks(ticks)
axis.label.set_color('#999999')
if hideLabels: axis.set_ticklabels([])
plt.grid(color=gridColor, linewidth=gridWidth, linestyle='-')
map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])
return fig, ax
# generate layout and plot
fig, ax = preparePlot(np.arange(.5, 11, 1), np.arange(.5, 49, 1), figsize=(8,7), hideLabels=True,
gridColor='#eeeeee', gridWidth=1.1)
image = plt.imshow(dataValues,interpolation='nearest', aspect='auto', cmap=cm.Greys)
for x, y, s in zip(np.arange(-.125, 12, 1), np.repeat(-.75, 12), [str(x) for x in range(12)]):
plt.text(x, y, s, color='#999999', size='10')
plt.text(4.7, -3, 'Feature', color='#999999', size='11'), ax.set_ylabel('Observation')
display(fig)
pass
但是没有用
*我的rawData像这样:
dataValues = map(lambda lp: float(lp.features.toArray()),parsedSampleMorePoints)
函数['2001.0,0.884123733793,0.610454259079,0.600498416968,0.474669212493,0.247232680947,0.357306088914,0.344136412234,0.339641227335,0.600858840135,0.425704689024,0.60491501652,0.419193351817']
还将逗号分隔的unicode字符串转换为parsePoint
,并定义为
LabeledPoint
更新:我也尝试过此方法,但没有帮助:
def parsePoint(line):
"""Converts a comma separated unicode string into a `LabeledPoint`.
Args:
line (unicode): Comma separated unicode string where the first element is the label and the
remaining 12 elements are features.
Returns:
LabeledPoint: The line is converted into a `LabeledPoint`, which consists of a label and
features. where features is a list.
"""
splitted = str.split(line, ',')
features = splitted[1:]
label = splitted[0]
return LabeledPoint(label, features)