Question

1.streaming数据来自kafka 2.通过火花流消费 3.firstname，lastname，userid和membername（使用成员名我正在获取成员数例如mark，tyson，2，chris，lisa，iwanka-所以这里的会员数是3

我必须以某种方式计算其必要性。但是如何在聚合后删除重复数据删除。是我的关注

    String urlDownload = <YOUR_URL>;
    DownloadManager.Request request = new DownloadManager.Request(Uri.parse(urlDownload));

    request.setDescription("Testando");
    request.setTitle("Download");
    request.allowScanningByMediaScanner();
    request.setNotificationVisibility(DownloadManager.Request.VISIBILITY_VISIBLE_NOTIFY_COMPLETED);
    request.setDestinationInExternalPublicDir(Environment.DIRECTORY_DOWNLOADS,"teste.zip");

    final DownloadManager manager = (DownloadManager) getSystemService(Context.DOWNLOAD_SERVICE);

    final long downloadId = manager.enqueue(request);

    final ProgressBar mProgressBar = (ProgressBar) findViewById(R.id.progressBar1);

    new Thread(new Runnable() {

        @Override
        public void run() {

            boolean downloading = true;

            while (downloading) {

                DownloadManager.Query q = new DownloadManager.Query();
                q.setFilterById(downloadId);

                Cursor cursor = manager.query(q);
                cursor.moveToFirst();
                int bytes_downloaded = cursor.getInt(cursor
                        .getColumnIndex(DownloadManager.COLUMN_BYTES_DOWNLOADED_SO_FAR));
                int bytes_total = cursor.getInt(cursor.getColumnIndex(DownloadManager.COLUMN_TOTAL_SIZE_BYTES));

                if (cursor.getInt(cursor.getColumnIndex(DownloadManager.COLUMN_STATUS)) == DownloadManager.STATUS_SUCCESSFUL) {
                    downloading = false;
                }

                final int dl_progress = (int) ((bytes_downloaded * 100l) / bytes_total);

                runOnUiThread(new Runnable() {

                    @Override
                    public void run() {

                        mProgressBar.setProgress((int) dl_progress);

                    }
                });

                Log.d(Constants.MAIN_VIEW_ACTIVITY, statusMessage(cursor));
                cursor.close();
            }

        }
    }).start();

第1批输出

  val df2=df.select(firstname,lastname,membercount,userid)
  df2.writestream.format("console").start().awaitTermination

  or     
 df3.select("*").where("membercount >= 3").dropDuplication("userid")

 // this one is not working , but i need to do the same after
   count only so that in batches same user id will not come again.
   only first time entry i want.

batch-2输出

  firstname         lastname          member-count            userid

  john              smith                   5                  1
  mark              boucher                 8                  2
  shawn              pollock                3                  3

//但在这里我要批处理-2 ---------输出

1。可能是约翰·史密斯（John Smith），下一个批次的休克·波洛克数将再次增加，但是我不想显示或保留下一个批次的产量。

即基于userid，我只想一次输入批处理输出并在批处理输出中再次忽略同一用户名姓成员数用户标识克里斯·乔丹6 4

Answer 1

您的问题很难看懂，但据我了解，您希望有条件的while循环吗？

var a = 10;
while(a < 20){
     println( "Value of a: " + a );
     a = a + 1;
  }

例如将打印

value of a: 10
value of a: 11
value of a: 12
value of a: 13
value of a: 14
value of a: 15
value of a: 16
value of a: 17
value of a: 18
value of a: 19

在Spark Streaming中聚合后努力处理重复数据删除

1 个答案: