Spark(1.4.0)流在启动期间无限期运行

时间:2015-06-22 04:12:20

标签: apache-spark spark-streaming

  • 我正在尝试使用Spark的TwitterUtils api(v1.4.0)获取Twitter流。连接已正确建立,但无法在控制台中看到推文而是以下信息/警告消息在控制台中无限期重复*
  

2015-06-22 09:21:52.844 INFO 6456 --- [Thread-40] org.apache.spark.storage.MemoryStore:ensureFreeSpace(13598)调用curMem = 45830,maxMem = 1017800294   2015-06-22 09:21:52.845 INFO 6456 --- [Thread-40] org.apache.spark.storage.MemoryStore:阻塞输入-0-1434945112600作为字节存储在内存中(估计大小为13.3 KB,免费970.6 MB )   2015-06-22 09:21:52.847 INFO 6456 --- [lt-dispatcher-2] o.apache.spark.storage.BlockManagerInfo:在localhost:50335的内存中添加了输入-0-1434945112600(大小:13.3 KB,免费:970.6 MB)   2015-06-22 09:21:52.852 WARN 6456 --- [Thread-40] org.apache.spark.storage.BlockManager:阻止输入-0-1434945112600仅复制到0个对等体而不是1个对等体   2015-06-22 09:21:52.862 INFO 6456 --- [Thread-40] o.a.s.streaming.receiver.BlockGenerator:推送块输入-0-1434945112600   ...

以下是我的RDD定义

final JavaReceiverInputDStream<Status> receiverStream = TwitterUtils
            .createStream(streamingSC);

    final JavaDStream<String> statuses = receiverStream
            .map(new TwitterStatusStream());

    final JavaDStream<String> lines = statuses
            .flatMap(new TwitterLineFunction());
    final JavaDStream<String> words = lines
            .flatMap(new TwitterWordFunction());

    final JavaDStream<String> hashTags = words
            .filter(new TwitterHashTagFunction());

    // statuses.print();
    hashTags.print();
    streamingSC.start();
    streamingSC.awaitTermination();

在下面的功能中,我打印的记录器没有在控制台中显示..

public class TwitterStatusStream implements Function<Status, String>,
    Serializable {
public static final Logger logger = Logger.getLogger(TwitterStatusStream.class);
private static final long serialVersionUID = -6529156421224365069L;

@Override
public String call(Status status) {
    String str = status.getText();
    logger.info(str);
    return str;
}
}

任何帮助将不胜感激...... Java Spark上下文创建和Java流上下文创建

public JavaSparkContext javaSparkContext() {
    return new JavaSparkContext(getSparkConf());
}
@Bean
public JavaStreamingContext javaStreamingContext() {
    return new JavaStreamingContext(javaSparkContext(),
            Durations.seconds(2000));
}

private SparkConf getSparkConf() {
    SparkConf conf = new SparkConf();
    conf.setAppName("TwitterSpark");
    conf.setMaster(master);
    conf.set("spark.executor.memory", "512m");
    conf.set("spark.cores.max", "3");
    conf.set("spark.default.parallelism", "1");
    return conf;
}

0 个答案:

没有答案