每个元素的Java CSV正则表达式匹配用于验证每个单元格

时间:2018-02-22 19:31:22

标签: java regex csv validation mapreduce

好的,所以我有这个问题,我似乎无法解决这个问题。

我想做的事情如下:

逐行读取CSV文件,将其拆分为逗号并将其传递给散列映射,然后执行一些操作。

我有效地尝试复制java中map reduce的一些行为。

现在,我到目前为止的是:

public class mapper {

public static void main(String[] args) {

    //file reading - here.

    Scanner filePathInput = new Scanner(System.in);
    String filePath = filePathInput.nextLine();
    File file = new File(filePath);


    if (file.isFile()) {
        Scanner fileInput = null;
        try {
            fileInput = new Scanner(file);
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            return;
        }

        ArrayList<String> lineBuffer = new ArrayList<>();

        while (fileInput.hasNextLine()) {
            String line = fileInput.nextLine();
          //  char ch = line.charAt(0);
            lineBuffer.add(line);

            //String[] values = line.split(",");
           // Map<String, Integer> reducer = new HashMap<String, Integer>();
            // parse the line here
            //System.out.println(values);
        }
        HashMap<String, ArrayList<FlightData>> test = mapper(lineBuffer);

    }

}

然后我将映射器放到哈希映射中:

   public static HashMap<String, ArrayList<FlightData>> mapper(ArrayList<String> lineBuffer) {

    HashMap<String, ArrayList<FlightData>> mapdata = new HashMap<>();

    for (String flightData: lineBuffer) {
        String[] str = flightData.split(",");
        FlightData flight = new FlightData(str[0], str[1], str[2].toCharArray(),str[3].toCharArray(), new Date(Long.valueOf(str[4])), Long.valueOf(str[5]).longValue());
        mapdata.get(flight.getFlightID());
        if(mapdata.containsKey(flight.getFlightID())){
            mapdata.get(flight.getFlightID()).add(flight);
        }
        else {
            ArrayList<FlightData> noID = new ArrayList<>();
            noID.add(flight);
            mapdata.put(flight.getFlightID(), noID);
        }

    }

  System.out.println(mapdata);

    return mapdata;

}

我的飞行数据对象在这里用getter等定义:

public class FlightData {

private String passengerID;
private String flightID;
private char[] fromID = new char[3];
private char[] tooID = new char[3];
public Date departTime;
public long flightTimeMins;
public Date arrivalTime;

//Constucter;
    public FlightData(String passengerID, String flightID, char[] fromID, char[] tooID, Date departTime, long flightTimeMins) {
        setPassengerID(passengerID);
        setFlightID(flightID);
        setFromID(fromID);
        setTooID(tooID);
        setFlightTimeMins(flightTimeMins);
        setDepartTime(departTime);
        setArrivalTime(arrivalTime);

但是,我遇到的问题是,如何进行验证:

大概我需要创建一个包含我所有模式的类,以及所有这些逻辑吗?并在需要时调用它?

我为此设置了一个基本类:

public class Validation {

public static void validate(String theReg, String str2Check) {

    final Pattern PtnPassenger = Pattern.compile(theReg);
    final Pattern PtnFlight = Pattern.compile(theReg);
    final Pattern PtnFrom = Pattern.compile(theReg);
    final Pattern PtnToo = Pattern.compile(theReg);

    Matcher regexMatcher = PtnPassenger.matcher(str2Check);

    while (regexMatcher.find()) {
        if (regexMatcher.group().length() != 0) {

            System.out.println(regexMatcher.group().trim());
        }



    }
}

但是,如何执行以下操作:

  1. 设置它以便每行读取它检查,它是空的?
  2. 设置它然后,如果不是它检查&#34; cell&#34;对模式,移动到下一个并重复步骤1
  3. 因此,例如,每行应包含以下逗号分隔数据:

      

    PID,FID,FromID,TooID,时间(linux纪元)分钟,例如:

         

    BWI0520BG0,MOO1786A,MAD,FRA,1420563408,184

         

    所以,例如,对于pID,我需要这样的正则表达式:

    [A-Z]{3}[0-9]{4}[A-Z]{2}[0-9]{1}
    

    但是,我该如何检查每个元素?我应该在将它们传递到哈希映射之前执行此操作吗?要么?

    任何帮助都会很棒。

    干杯

0 个答案:

没有答案