我有一个如下所示的Scala数据框df
:
+-----+--------------------+
|id | measured_value|
+-----+--------------------+
| 0| 1999298|
| 1| 854791|
| 2| 1032910|
| 3| 310905|
| 4| 515442|
| 5| 4176270|
| 6| 807807|
+-----+--------------------+
我想将名为measured_value
的列转换为整数(Seq[Int]
)序列,并尝试使用df.select("measured_value").rdd.map(r=>r(0)).collect()
。但这给了我Array[Any]
。如何将其转换为Seq[Int]
?
答案 0 :(得分:1)
尝试一下:
import { async, ComponentFixture, TestBed } from '@angular/core/testing';
import { RouterTestingModule } from '@angular/router/testing';
import { BackupComponent } from './backup.component';
describe('BackupComponent', () => {
let component: BackupComponent;
let fixture: ComponentFixture<BackupComponent>;
beforeEach(async(() => {
TestBed.configureTestingModule({
declarations: [ BackupComponent, BackupListComponent ],
imports: [RouterTestingModule]
})
.compileComponents();
}));
beforeEach(() => {
fixture = TestBed.createComponent(BackupComponent);
component = fixture.componentInstance;
fixture.detectChanges();
});
it('should be created', () => {
expect(component).toBeTruthy();
});
});
可以找到与该主题相关的一些有用示例here。
还请记住,df.select("measured_value").map(_.getInt(0)).collect.toSeq
会导致在Spark驱动程序上收集所有数据,因此,对于大数据集,从资源的角度来看可能会很昂贵。