为什么对kafka主题进行流式加入查询需要这么长时间?

时间:2018-11-27 09:45:11

标签: scala apache-spark spark-structured-streaming

我正在使用Spark结构化流,并加入了来自Kafka主题的两个流。

DAG for the job

我注意到,每条记录的流查询大约需要15秒。在下面的屏幕截图中,阶段ID 2需要15秒。为什么会这样?

Time taken by each stage

代码如下:

public class ExampleService extends Service implements SensorEventListener2{

private SensorManager sensorManager;
private List<Sensor> sensors;
private Sensor sensor;

private long numevents;

@Override
public void onCreate() {
    super.onCreate();

    // Create notification required for foreground service
    Intent notificationIntent = new Intent(this, MainActivity.class);
    PendingIntent pendingIntent = PendingIntent.getActivity(this,
            0, notificationIntent, 0);

    Notification notification = new NotificationCompat.Builder(this, CHANNEL_ID)
            .setContentTitle("Example Service")
            .setContentText("MyNotification")
            .setSmallIcon(R.drawable.ic_android)
            .setContentIntent(pendingIntent)
            .build();

    startForeground(1, notification);

    // Get the sensor manager and register this ExampleService instance as a listener
    sensorManager = (SensorManager) getSystemService(SENSOR_SERVICE);

    sensors = sensorManager.getSensorList(Sensor.TYPE_ACCELEROMETER);

    if (sensors.size() > 0)
        sensor = sensors.get(0);

    sensorManager.registerListener(this, sensor,
            20000 /* 50Hz  */,
            20000000 /* maxBatchReportLatencyUs 20 seconds */);


    // PROBLEM - the FIFO queue gets reset by the previous line, 
    // and only a handful of events, if any, get flushed
    numevents = 0;
    sensorManager.flush(this);
}

@Override
public int onStartCommand(Intent intent, int flags, int startId) {
    Log.v("onStartCommand","started");
    return START_NOT_STICKY;
}


@Override
public void onSensorChanged(SensorEvent event) {
    if (event.sensor.getType() == Sensor.TYPE_ACCELEROMETER) {
        numevents += 1L;
    }
}


@Override
public void onFlushCompleted(Sensor sensor) {
    Log.v("onFlushCompleted","Num flushed events "+numevents);

    // set a new alarm to invoke this service again in 10 seconds
    Intent serviceIntent = new Intent(this, ExampleService.class);
    PendingIntent pendingServiceIntent = PendingIntent.getForegroundService(this, 0, serviceIntent, 0);
    AlarmManager alarm = (AlarmManager)getSystemService(Context.ALARM_SERVICE);
    alarm.setExact(AlarmManager.RTC_WAKEUP, System.currentTimeMillis()+10000, pendingServiceIntent);

    // Stop this service
    this.stopSelf();
}

@Override
public void onDestroy() {
    Log.v("onDestroy","stopped");
    super.onDestroy();

    if (sensorManager != null) {
        sensorManager.unregisterListener(this);
    }
    sensorManager = null;
}

@Nullable
@Override
public IBinder onBind(Intent intent) {
    return null;
}


@Override
public void onAccuracyChanged(Sensor sensor, int accuracy) {
}

从代码角度看,一切正常。唯一的问题是加入两个流的时间。如何优化此查询?

1 个答案:

答案 0 :(得分:1)

在给定主URL即.master("local")的情况下,执行时间很可能不令人满意。至少将其更改为local[*],您将发现连接更快。