我想通过使用公式获得C4, 例如,当c1 ='104001'时,计算C4
答案 0 :(得分:0)
您可以使用以下内容添加另一列:
from pyspark.sql import Row
from pyspark import SparkContext, SQLContext
from pyspark.sql.functions import udf
sc = SparkContext()
sqlContext = SQLContext(sc)
l = [(25,24),[23,45],[24,56]]
rdd = sc.parallelize(l)
dummy = rdd.map(lambda x: Row(var1=int(x[0]),var2=int(x[1])))
dummyframe = sqlContext.createDataFrame(dummy)
def getValDivideSum(dataFrame):
max = dataFrame.agg({"var2":'sum'}).collect()[0][0]
dataFrame = dataFrame.withColumn("var3",dataFrame.var2/max).select("var1","var2","var3")
return dataFrame
输出将是这样的:
+----+----+-----+
|var1|var2| var3|
+----+----+-----+
| 25| 24|0.192|
| 23| 45| 0.36|
| 24| 56|0.448|
+----+----+-----+
希望这会有所帮助。