我在pyspark中遇到了一个函数问题(我正在练习入门)。 我有这个功能:
%pyspark
#Fuction to define and take in all the variables.
def myfunc(line):
l1=line.split(",")
return row(Id=l1[0],MSSubClass=l1[1],MSZoning=l1[2],LotFrontage=str(l1[3]),
LotArea=str(l1[4]),Street=l1[5],Alley=l1[6],
LotShape=l1[7],LandContour=l1[8],Utilities=l1[9],LotConfig=l1[10],
LandSlope=l1[11],Neighborhood=l1[12],Condition1=l1[13],Condition2=l1[14],BldgType=l1[15],
HouseStyle=l1[16],OverallQual=l1[17],OverallCond=l1[18],YearBuilt=l1[19],YearRemodAdd=l1[20],
RoofStyle=l1[21],RoofMatl=l1[22],Exterior1st=l1[23],Exterior2nd=l1[24],MasVnrType=l1[25],
MasVnrArea=l1[26],ExterQual=l1[27],ExterCond=l1[28],Foundation=l1[29],BsmtQual=l1[30],
BsmtCond=l1[31],BsmtExposure=l1[32],BsmtFinType1=l1[33],BsmtFinSF1=str(l1[34]),BsmtFinType2=l1[35],
BsmtFinSF2=str(l1[36]),BsmtUnfSF=str(l1[37]),TotalBsmtSF=str(l1[38]),Heating=l1[39],HeatingQC=l1[40],
CentralAir=l1[41],Electrical=l1[42],firstFlrSF=str(l1[43]),secondFlrSF=str(l1[44]),LowQualFinSF=l1[45],
GrLivArea=str(l1[46]),BsmtFullBath=str(l1[47]),BsmtHalfBath=str(l1[48]),FullBath=str(l1[49]),
HalfBath=str(l1[50]),BedroomAbvGr=str(l1[51]),KitchenAbvGr=str(l1[52]),KitchenQual=l1[53],
TotRmsAbvGrd=str(l1[54]),Functional=l1[55],Fireplaces=str(l1[56]),
FireplaceQu=l1[57],GarageType=l1[58],GarageYrBlt=l1[59],GarageFinish=l1[60],
GarageCars=str(l1[61]),GarageArea=str(l1[62]),GarageQual=l1[63],GarageCond=l1[64],PavedDrive=l1[65],
WoodDeckSF=str(l1[66]),OpenPorchSF=str(l1[67]),EnclosedPorch=str(l1[68]),threeSsnPorch=str(l1[69]),
ScreenPorch=str(l1[70]),PoolArea=str(l1[71]),PoolQC=l1[72],Fence=l1[73],MiscFeature=l1[74],
MiscVal=str(l1[75]),MoSold=l1[76],YrSold=l1[77],SaleType=l1[78],SaleCondition=l1[79],SalePrice=str(l1[80]))
res = rdddata.map(myfunc)
temp=spark.createDataFrame(res)
temp.dtypes
运行NameError: global name 'row' is not defined (pyspark)
时出现错误temp=spark.createDataFrame(res)
。
我将行初始化为空字符串,然后收到错误TypeError: 'str' object is not callable
。
有人可以帮助我理解吗?我遵循的教程也做同样的事情。