我在SparkML中使用tf-idf算法获得了一些特征向量。现在我想获得每个Vector中的最大值。如何按值对Vector进行排序或获取它的最大值?
import org.apache.spark.ml.linalg.Vector
val testDF = spark.read.json("/dataset/yelp_review_test.json")
val tokenizer = new Tokenizer().setInputCol("text").setOutputCol("words")
val wordsData = tokenizer.transform(testDF)
//wordsData.show()
val hashTF = new HashingTF().setInputCol("words").setOutputCol("tfFeatures")
val tfFeatures = hashTF.transform(wordsData)
//tfFeatures.select("review_id","words","tfFeatures").foreach(println(_))
val idf = new IDF().setInputCol("tfFeatures").setOutputCol("idfFeatures")
val idfModel = idf.fit(tfFeatures)
val allDF = idfModel.transform(tfFeatures)
allDF.show()
idfFeatures的行向量是这样的:
(262144,[7617,24417,36200,61231,65069,66865,95805,103838,117481,138356,142373,151536,161061,189683,200556,204852,205044,218917,222453,227410,232735,235447],[2.1972245773362196,0.1823215567939546,1.5040773967762742,0.49247648509779424,1.791759469228055,1.2809338454620642,1.2809338454620642,0.0,1.791759469228055,1.0986122886681098,2.1972245773362196,0.8109302162163288,2.1972245773362196,0.25131442828090617,2.1972245773362196,2.1972245773362196,0.4054651081081644,1.791759469228055,1.888923217681703,0.0,2.1972245773362196,2.1972245773362196])
答案 0 :(得分:0)
因为它是一个sparkML矢量,你可以把它转换成一个普通的集合,并使用可用的函数来找到这样的最大值:
<data>
<variable name="viewModel"
type="com.aapp.viewmodel.TestSpinnerViewModel"/>
</data>
<LinearLayout android:layout_width="match_parent"
android:layout_height="wrap_content">
<android.support.v7.widget.AppCompatSpinner
android:layout_width="wrap_content"
android:layout_height="match_parent"
android:id="@+id/sTimeHourSpinner"
android:entries="@{viewModel.startTimeHourSelections}"
android:selectedItemPosition="@={viewModel.startHourIdx}"/>
</LinearLayout>
或者替换元组中的向量:
myVector.toArray.reduce( (a, b) => if (a > b) a else b )