我想将Row转换为RDD以使用foreachRDD将数据发送到Kafka生产者。以下是dstream1.pprint()
Row(Banked_Date_Calc__c=0 NaN
Name: Banked_Date_Calc__c, dtype: float64, CloseDate=0 2018-06-14T00:00:00.000Z
Name: CloseDate, dtype: object, CourseGEV__c=0 NaN
Name: CourseGEV__c, dtype: float64, Id=0 0060h0000169O5BAAU
Name: Id, dtype: object, OwnerId=0 005E0000000WruiIAC
Name: OwnerId, dtype: object, timestamp=0 2018-06-14 13:21:30.631768
Name: timestamp, dtype: datetime64[ns])
-------------------------------
dstream1 = lines1.transform(lambda x: x.map(unique_msg))
dstream1.pprint()
所以dstream1.pprint()
的输出是上面提到的。
现在我想将流发送到kafka,所以我使用 dstream1.foreachRDD(handler)
我的handler()
位于以下位置:
def handler(rdd):
print "RDD1: ", rdd
records = rdd.collect()
print "RDD2: ", records
for record in records:
producer.send('testing2', bytes(record))
producer.flush()
但是当我打印print "RDD2: ", records
时,它正在给我[无]
我很困惑。并想知道如何将数据发送到kafka。