当我向Kafka主题发送一个字符串时,我的spark应用程序发送两封邮件。以下是代码中感兴趣的部分:
JavaDStream<String> lines = kafkaStream.map ( [returns the 2nd value of the tuple];
lines.foreachRDD(new VoidFunction<JavaRDD<String>>() {
[... some stuff ...]
JavaRDD<String[]> flagAddedRDD = associatedToPersonRDD.map(new Function<String[],String[]>(){
@Override
public String[] call(String[] arg0) throws Exception {
String[] s = new String[arg0.length+1];
System.arraycopy(arg0, 0, s, 0, arg0.length);
int a = FilePrinter.getAge(arg0[CSVExampleDevice.LENGTH+People.BIRTH_DATE]);
int p = Integer.parseInt(arg0[CSVExampleDevice.PULSE]);
if(
((p<=45 || p>=185)&&(a<=12 || a>=70))
||
(p>=190 || p<=40)){
s[arg0.length]="1";
Mailer.sendMail(mailTo, arg0);
}
else
s[arg0.length]="0";
return s;
}
});`
我无法理解邮件是否发送了两封电子邮件,因为此后的转换是保存文件而这只返回1行。 mailer.sendMail:
public static void sendMail(String whoTo, String[] whoIsDying){
Properties props = new Properties();
props.put("mail.smtp.host", "mail.***.com"); //edited
props.put("mail.smtp.port", "25");
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Session session = Session.getInstance(props);
try {
Message message = new MimeMessage(session);
message.setFrom(new InternetAddress("***my-email***")); //edited
for (String string : whoTo.split(","))
message.addRecipient(Message.RecipientType.TO,
new InternetAddress(string));
message.setSubject(whoIsDying[PersonClass.TIMESTAMP]);
message.setText("trial");
System.out.println("INFO: sent mail");
Transport.send(message);
} catch (MessagingException e) {
throw new RuntimeException(e);
}
}
答案 0 :(得分:0)
发生这种情况的原因是因为我在转换结束时调用了两个动作:
FilePrinter.saveAssociatedAsCSV(associatedSavePath, unifiedAssociatedStringRDD.collect()); //first action
JavaRDD<String[]> enrichedWithWeatherRDD = flagAddedRDD.map(new Function<String[],String[]>(){ [some more stuff] });
JavaRDD<String> unifiedEnrichedStringRDD = enrichedWithWeatherRDD.map(unifyArrayIntoString);
FilePrinter.saveEnrichedAsCSV(enrichedSavePath, unifiedEnrichedStringRDD.collect()); //second action
因此再次调用整个转换,邮件部分高于这两个行为。