我是hadoop流媒体新手。我的reduce代码中有一些过滤条件,我想知道有多少条记录通过了这个条件。我知道我们可以通过编写自定义计数器来做到这一点。有些人体能指出我如何编写自定义计数器吗?
我在mapper代码中发出了三列,比如a,b,c
key是a,value是列表,就像[b,c]
一样。要从mapper代码中获得示例,它就像['I'^['C','P']]
这是我的简化代码。
labels = ["a","b"]
for line in sys.stdin:
l = line.strip().split("^")
key = l[0]
value = l[1]
record = [key] + value
records.append(record)
df = pd.DataFrame.from_records(records,columns=labels)
df = df((df['a'] == 'I') & (df['b'] == 'C'))
我想知道df包含多少条记录,在reducer级别。
谢谢。
答案 0 :(得分:2)
您只需打印到stderr:
package com.stackoverflow.main;
import java.awt.Color;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javax.swing.JButton;
import javax.swing.JDialog;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.Timer;
public class Start extends JFrame implements ActionListener {
JButton Polski, English; // nazwy przycisków
JLabel jezyk, language;
static JLabel wybór;
private JDialog messageDialog;
private JLabel messageLabel;
public Start() {
getContentPane().setBackground(Color.BLUE);
setSize(330, 170);// rozmiar
setTitle("MathCalc v0.1 by Majkel");
setLayout(null);
messageDialog = new JDialog(this);
messageLabel = new JLabel("", JLabel.CENTER);
messageDialog.setSize(200, 200);
messageDialog.getContentPane().add(messageLabel);
jezyk = new JLabel("Choose language:");
jezyk.setBounds(40, 10, 200, 40);
add(jezyk);
Polski = new JButton("Polski");
Polski.setBounds(40, 50, 100, 30);
add(Polski);
Polski.addActionListener(this);
English = new JButton("English");
English.setBounds(150, 50, 100, 30);
add(English);
English.addActionListener(this);
wybór = new JLabel("Choose");
wybór.setBounds(40, 90, 400, 30);
add(wybór);
}
public static void main(String[] args) {
System.out.println("Choose language:");
Start okno1 = new Start();
okno1.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
okno1.setVisible(true);
}
@Override
public void actionPerformed(ActionEvent e) {
Object klik = e.getSource();
if (klik == Polski) {
wybór.setText("Wybrałeś język Polski.".toString());
hideThisShowNewFrameAfterDelay(1500);
showMessageDialog("Wybrałeś język Polski");
System.out.println("Wybrałeś język Polski.");
}
else if (klik == English) {
wybór.setText("You have chosen English.".toString());
hideThisShowNewFrameAfterDelay(1500);
showMessageDialog("You have chosen English.");
System.out.println("You have chosen English.");
}
}
private void showMessageDialog(String message) {
messageLabel.setText(message);
messageDialog.setVisible(true);
}
private void hideMessageDialog() {
messageDialog.dispose();
}
private void hideThisShowNewFrameAfterDelay(int milliseconds) {
Timer timer = new Timer(milliseconds, new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
hideMessageDialog();
Start.this.setVisible(false);
new czynnośćPL().setVisible(true);
}
});
timer.setRepeats(false);
timer.start();
}
}
这将增加计数器" NbRecords"在柜台组" CUSTOM"由1
答案 1 :(得分:1)
如果using mrjob,
class MRCountingJob(MRJob):
def mapper(self, _, value):
self.increment_counter('group', 'counter_name', 1)
yield _, value
如果使用基本hadoop streaming API(使用python),
sys.stderr.write("reporter:counter:group,counter_name,1\n")
例如,group
可以是"My Mapper
“,"My Reducer"
或"My FooBar"
计数器可能是num_calls
,而且值通常总是为1,因为这些将由框架加总。(使用stderr.write
时,不要忘记尾随换行符,\n
)