我有一个分区的Cassandra表:
myDF.distinct().write
.cassandraFormat(keyspace = "test", table = "details", cluster="cluster")
.mode(SaveMode.Append)
.save()
我正在使用Scala 2.11.8和Spark 2.0以及Cassandra。这里的表格由' date'分隔。山坳。那么在这种情况下,如何将数据框保存到此表中?是否有我需要使用的API的Scala代码示例?没有我正在使用的分区和群集:
public void getSongsList() {
List<String> fullsongpath = new ArrayList<>();
Uri allsongsuri = MediaStore.Audio.Media.EXTERNAL_CONTENT_URI;
String selection = MediaStore.Audio.Media.IS_MUSIC + " != 0";
Cursor cursor = managedQuery(allsongsuri, null, selection, null, null);
if (cursor != null) {
if (cursor.moveToFirst()) {
do {
String name = cursor.getString(cursor.getColumnIndex(MediaStore.Audio.Media.DISPLAY_NAME));
int id = cursor.getInt(cursor.getColumnIndex(MediaStore.Audio.Media._ID));
String songPath = cursor.getString(cursor.getColumnIndex(MediaStore.Audio.Media.DATA));
fullsongpath.add(songPath);
String artistName = cursor.getString(cursor
.getColumnIndex(MediaStore.Audio.Media.ARTIST));
int artistId = cursor.getInt(cursor
.getColumnIndex(MediaStore.Audio.Media.ARTIST_ID));
String albumName = cursor.getString(cursor
.getColumnIndex(MediaStore.Audio.Media.ALBUM));
int albumId = cursor.getInt(cursor
.getColumnIndex(MediaStore.Audio.Media.ALBUM_ID));
} while (cursor.moveToNext());
}
cursor.close();
}
}
这应该保存在流应用程序中的每个微批处理中,以防选择面向性能的API
答案 0 :(得分:3)
Spark Cassandra Connector自动分区和批处理。最终用户不需要做任何事情。见
Basic overview of how writes happen
或了解更多详情 This tuning overview