好的,所以我有这个问题,我似乎无法解决这个问题。
我想做的事情如下:
逐行读取CSV文件,将其拆分为逗号并将其传递给散列映射,然后执行一些操作。
我有效地尝试复制java中map reduce的一些行为。
现在,我到目前为止的是:
public class mapper {
public static void main(String[] args) {
//file reading - here.
Scanner filePathInput = new Scanner(System.in);
String filePath = filePathInput.nextLine();
File file = new File(filePath);
if (file.isFile()) {
Scanner fileInput = null;
try {
fileInput = new Scanner(file);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return;
}
ArrayList<String> lineBuffer = new ArrayList<>();
while (fileInput.hasNextLine()) {
String line = fileInput.nextLine();
// char ch = line.charAt(0);
lineBuffer.add(line);
//String[] values = line.split(",");
// Map<String, Integer> reducer = new HashMap<String, Integer>();
// parse the line here
//System.out.println(values);
}
HashMap<String, ArrayList<FlightData>> test = mapper(lineBuffer);
}
}
然后我将映射器放到哈希映射中:
public static HashMap<String, ArrayList<FlightData>> mapper(ArrayList<String> lineBuffer) {
HashMap<String, ArrayList<FlightData>> mapdata = new HashMap<>();
for (String flightData: lineBuffer) {
String[] str = flightData.split(",");
FlightData flight = new FlightData(str[0], str[1], str[2].toCharArray(),str[3].toCharArray(), new Date(Long.valueOf(str[4])), Long.valueOf(str[5]).longValue());
mapdata.get(flight.getFlightID());
if(mapdata.containsKey(flight.getFlightID())){
mapdata.get(flight.getFlightID()).add(flight);
}
else {
ArrayList<FlightData> noID = new ArrayList<>();
noID.add(flight);
mapdata.put(flight.getFlightID(), noID);
}
}
System.out.println(mapdata);
return mapdata;
}
我的飞行数据对象在这里用getter等定义:
public class FlightData {
private String passengerID;
private String flightID;
private char[] fromID = new char[3];
private char[] tooID = new char[3];
public Date departTime;
public long flightTimeMins;
public Date arrivalTime;
//Constucter;
public FlightData(String passengerID, String flightID, char[] fromID, char[] tooID, Date departTime, long flightTimeMins) {
setPassengerID(passengerID);
setFlightID(flightID);
setFromID(fromID);
setTooID(tooID);
setFlightTimeMins(flightTimeMins);
setDepartTime(departTime);
setArrivalTime(arrivalTime);
但是,我遇到的问题是,如何进行验证:
大概我需要创建一个包含我所有模式的类,以及所有这些逻辑吗?并在需要时调用它?
我为此设置了一个基本类:
public class Validation {
public static void validate(String theReg, String str2Check) {
final Pattern PtnPassenger = Pattern.compile(theReg);
final Pattern PtnFlight = Pattern.compile(theReg);
final Pattern PtnFrom = Pattern.compile(theReg);
final Pattern PtnToo = Pattern.compile(theReg);
Matcher regexMatcher = PtnPassenger.matcher(str2Check);
while (regexMatcher.find()) {
if (regexMatcher.group().length() != 0) {
System.out.println(regexMatcher.group().trim());
}
}
}
但是,如何执行以下操作:
因此,例如,每行应包含以下逗号分隔数据:
PID,FID,FromID,TooID,时间(linux纪元)分钟,例如:
BWI0520BG0,MOO1786A,MAD,FRA,1420563408,184
所以,例如,对于pID,我需要这样的正则表达式:
[A-Z]{3}[0-9]{4}[A-Z]{2}[0-9]{1}
但是,我该如何检查每个元素?我应该在将它们传递到哈希映射之前执行此操作吗?要么?
任何帮助都会很棒。
干杯