Spark数据框从行中选择一列

时间:2019-04-09 11:19:03

标签: scala apache-spark

嗨,我想从具有多列的df创建具有一列的df。参见以下伪代码:

df=[(id1,text),(id2,text2),...] //df where each row has two columns "id","text"
id_column=df.map(row=>row._1)

如何在火花中做到这一点?

2 个答案:

答案 0 :(得分:0)

您可以选择使用它

text.addActionListener(new ActionListener() {
  public void actionPerformed(ActionEvent e) {
    if(text.getText().length()==8){
      do something
    }
  }
});


text.addKeyListener(new KeyListener() {
   public void keyPressed(KeyEvent keyEvent) {
   }

   public void keyReleased(KeyEvent keyEvent) {
   }

   public void keyTyped(KeyEvent keyEvent) {
     if(text.getText().length()==4){
       do something
     }
   }
});

如果您不知道列名和位置,也可以执行以下操作:

val df : DataFrame=[(id1,text),(id2,text2),...] //df where each row has two columns "id","text"

val dfResult: DataFrame = df.select("id")

答案 1 :(得分:-2)

找到了!! 我想要的是:

having a df like:
| id | text  |
| 1  |  bla1 |     
| 2  | foo   |
//real scala/spark code...
just_ids=df.map(row=>row.getAs[Int](0))
just_ids.show //will print just the id columns

//if you want just a select from a column it is also possible
df.select("text").show