如何在python中实现hadoop流中的计数器

时间:2017-03-01 07:30:07

标签: python hadoop

我是hadoop流媒体新手。我的reduce代码中有一些过滤条件,我想知道有多少条记录通过了这个条件。我知道我们可以通过编写自定义计数器来做到这一点。有些人体能指出我如何编写自定义计数器吗?

我在mapper代码中发出了三列,比如a,b,c key是a,value是列表,就像[b,c]一样。要从mapper代码中获得示例,它就像['I'^['C','P']]

这是我的简化代码。

labels = ["a","b"]
for line in sys.stdin:
    l = line.strip().split("^")
    key = l[0]
    value = l[1]
    record = [key] + value
    records.append(record)
df = pd.DataFrame.from_records(records,columns=labels)
df = df((df['a'] == 'I') & (df['b'] == 'C'))

我想知道df包含多少条记录,在reducer级别。

谢谢。

2 个答案:

答案 0 :(得分:2)

您只需打印到stderr:

package com.stackoverflow.main;

import java.awt.Color;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;

import javax.swing.JButton;
import javax.swing.JDialog;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.Timer;

public class Start extends JFrame implements ActionListener {
    JButton Polski, English; // nazwy przycisków
    JLabel jezyk, language;
    static JLabel wybór;
    private JDialog messageDialog;
    private JLabel messageLabel;

    public Start() {
        getContentPane().setBackground(Color.BLUE);
        setSize(330, 170);// rozmiar
        setTitle("MathCalc v0.1 by Majkel");
        setLayout(null);

        messageDialog = new JDialog(this);
        messageLabel = new JLabel("", JLabel.CENTER);
        messageDialog.setSize(200, 200);
        messageDialog.getContentPane().add(messageLabel);

        jezyk = new JLabel("Choose language:");
        jezyk.setBounds(40, 10, 200, 40);
        add(jezyk);

        Polski = new JButton("Polski");
        Polski.setBounds(40, 50, 100, 30);
        add(Polski);
        Polski.addActionListener(this);

        English = new JButton("English");
        English.setBounds(150, 50, 100, 30);
        add(English);
        English.addActionListener(this);

        wybór = new JLabel("Choose");
        wybór.setBounds(40, 90, 400, 30);
        add(wybór);

    }

    public static void main(String[] args) {
        System.out.println("Choose language:");
        Start okno1 = new Start();

        okno1.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        okno1.setVisible(true);
    }

    @Override
    public void actionPerformed(ActionEvent e) {
        Object klik = e.getSource();
        if (klik == Polski) {
            wybór.setText("Wybrałeś język Polski.".toString());
            hideThisShowNewFrameAfterDelay(1500);
            showMessageDialog("Wybrałeś język Polski");
            System.out.println("Wybrałeś język Polski.");
        }

        else if (klik == English) {
            wybór.setText("You have chosen English.".toString());
            hideThisShowNewFrameAfterDelay(1500);
            showMessageDialog("You have chosen English.");
            System.out.println("You have chosen English.");
        }
    }

    private void showMessageDialog(String message) {
        messageLabel.setText(message);
        messageDialog.setVisible(true);
    }

    private void hideMessageDialog() {
        messageDialog.dispose();
    }

    private void hideThisShowNewFrameAfterDelay(int milliseconds) {
        Timer timer = new Timer(milliseconds, new ActionListener() {

            @Override
            public void actionPerformed(ActionEvent e) {
                hideMessageDialog();
                Start.this.setVisible(false);
                new czynnośćPL().setVisible(true);
            }
        });
        timer.setRepeats(false);
        timer.start();
    }

}

这将增加计数器" NbRecords"在柜台组" CUSTOM"由1

答案 1 :(得分:1)

如果using mrjob

class MRCountingJob(MRJob):

    def mapper(self, _, value):
        self.increment_counter('group', 'counter_name', 1)
        yield _, value

如果使用基本hadoop streaming API(使用python),

sys.stderr.write("reporter:counter:group,counter_name,1\n")

例如,group可以是"My Mapper“,"My Reducer""My FooBar"计数器可能是num_calls,而且值通常总是为1,因为这些将由框架加总。(使用stderr.write时,不要忘记尾随换行符,\n