在kafka中将数据从源文件导入到文件时丢失数据

时间:2016-05-13 09:44:02

标签: java apache-kafka pyspark spark-streaming

我一直在使用kafka来获取和处理流输入,我有源和接收器属性:

name=local-file-sink
connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector
tasks.max=10
file=pnrsink5.xml
topics=test

name=local-file-source
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=10
file=pnrtes.xml
topic=test

我独立运行:

bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

当我使用编写为:

的java程序给出解析的输入
    import java.io.BufferedReader;
    import java.io.BufferedWriter;
    import java.io.FileNotFoundException;
    import java.io.FileReader;
    import java.io.FileWriter;
    import java.io.IOException;

    public class parse
     {
    public static void main(String args[]) throws IOException
      {
    int count=0,c=1,a=0;
    String file="/home/tthteg/speed_pnr/Source/Source_pnr.xml";
    String l1="</PNR>";
    String line;
    try
    {
     FileReader fileReader=new FileReader(file);
     BufferedReader bufferedReader=new BufferedReader(fileReader);
     String file="/home/tthteg/speed_pnr/Source/Source_pnr.xml";
     String l1="</PNR>";
     String line;
     try
     {
     FileReader fileReader=new FileReader(file);
     BufferedReader bufferedReader=new BufferedReader(fileReader);
     BufferedWriter bufWriter = new BufferedWriter(new FileWriter("/home/tthteg/speed_pnr/kafka/pnrtes.xml"));
     try
    {
     while((line=bufferedReader.readLine())!=null)
     {
       if(c%2==0)
       {
        Thread.sleep(5000);
        c=1;
   }
  if(line.contains("<PNR"))
  {
   c++;

   System.out.println(line);
   bufWriter.write(line);
   while((line=bufferedReader.readLine()).equals(l1)==false)
   {
     System.out.println(line);
     bufWriter.write(line);
      }
    bufWriter.write("</PNR>");
    bufWriter.flush();
    System.out.println("</PNR>");
     count++;
   a=count;
  }
}
     }catch(Exception e)
      {
        System.out.println(e);
       }
      bufferedReader.close();
      bufWriter.flush();
      System.out.println(a);
    }catch (FileNotFoundException e) {
    e.printStackTrace();
   }
      System.out.println(a);
    }
    }

问题:  1.无论何时运行java文件,我都必须触摸源文件(echo -e&gt;&gt; pnrtes.xml)以获取导入到sink文件的数据(pnrsink5.xml)  2.我在汇总文件中发现了一些数据缺失(大约4个字)。

提前谢谢

0 个答案:

没有答案