将Scala DataFrame列转换为Seq [Int]

时间:2019-09-03 08:42:04

标签: scala dataframe

我有一个如下所示的Scala数据框df

+-----+--------------------+
|id   |      measured_value|
+-----+--------------------+
|    0|             1999298|
|    1|              854791|
|    2|             1032910|
|    3|              310905|
|    4|              515442|
|    5|             4176270|
|    6|              807807|
+-----+--------------------+

我想将名为measured_value的列转换为整数(Seq[Int])序列,并尝试使用df.select("measured_value").rdd.map(r=>r(0)).collect()。但这给了我Array[Any]。如何将其转换为Seq[Int]

1 个答案:

答案 0 :(得分:1)

尝试一下:

import { async, ComponentFixture, TestBed } from '@angular/core/testing';
import { RouterTestingModule } from '@angular/router/testing';

import { BackupComponent } from './backup.component';

describe('BackupComponent', () => {
  let component: BackupComponent;
  let fixture: ComponentFixture<BackupComponent>;

  beforeEach(async(() => {
    TestBed.configureTestingModule({
      declarations: [ BackupComponent, BackupListComponent ],
      imports: [RouterTestingModule]
    })
    .compileComponents();
  }));

  beforeEach(() => {
    fixture = TestBed.createComponent(BackupComponent);
    component = fixture.componentInstance;
    fixture.detectChanges();
  });

  it('should be created', () => {
    expect(component).toBeTruthy();
  });
});

可以找到与该主题相关的一些有用示例here
还请记住,df.select("measured_value").map(_.getInt(0)).collect.toSeq 会导致在Spark驱动程序上收集所有数据,因此,对于大数据集,从资源的角度来看可能会很昂贵。