火花用卡夫卡蒸,无法接收所有数据

时间:2016-01-17 13:14:47

标签: apache-spark apache-kafka

在我的火花流程序中,我尝试从kafka接收数据。在kafka制作人中,我发送了100万条消息,但在火花流中,我无法收到所有消息。它总会丢失一些消息。 我使用默认配置启动kafka-server。 这是我的制作人代码:

<table class="item over  spicy_logo item_border" item_id="3464864" id="item_3464864" ua-action="Item" ua-label="Item">
    <tbody>
        <tr itemscope itemtype="http://schema.org/MenuItem">
            <td class="item_img_box" item_id="3464864" title="How is it?">
                <table>
                    <tbody>
                        <tr>
                            <td>
                                <div>
                                    <img id='img3464864' src="/yelp_images/s3-media4.fl.yelpcdn.com/bphoto/1P50jjYUA4ofx5hF85wm5Q/ms.jpg" align="left" class="item_img" border="0" alt="How is it?"/>
                                </div>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </td>
            <td class="item_name ">
                <div>
                    <a class="cpa" href="http://miami-beach.eat24hours.com/carrot-express/26721?item_id=3464864" itemprop="name">Teeka Salad</a>
                    <div class="item_desc" itemprop="description">Kale, sunflower sprouts, quinoa, avocado, grape tomato, alfalfa bean sprouts, carrots and cucumber with a choice of dressing.</div>
                </div>
            </td>
            <td class="item_price">
                <div >$<span itemprop="price">9.95</span></div>
            </td>
        </tr>
    </tbody>
</table>

这是我的火花流代码(这段代码在火花的例子中):

val props = new HashMap[String, Object]()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
for (i <- 1 to loop_times.toInt) {
   var cnt = 0
   while (cnt < record_count.toInt) {
        val message = new ProducerRecord[String, String](topic, null, "aaa")
        producer.send(message)
        cnt += 1
        if (cnt % 10000 == 0)
            println(s"send $cnt records")
    }
}
producer.close()

我的火花版本是1.6,卡夫卡的版本是0.8.2.1

0 个答案:

没有答案