基本上我想在这里做的是逐行读取文本文件并将它们格式化为: 姓氏,头衔,名字,中间,然后是出生/死亡日期,如MM / DD / YYYY
我在这样的日期读到:
Month, day, year
Mon. day, year
Mon day, year
MMDDYY
M/D/year
M-D-year
和这样的名字:
Last, Title First Middle (comma after name needed)
OR
Title First Middle Last
我一直在这工作很长一段时间,但是无法理解。下面是我非常混乱的代码,经过大量的改变,我想要解决这个问题,感谢你的时间任何想要帮助我的人(我是学生)这里也是一个读取名字的例子在:
Roger Veium MAY 12, 1908 JUNE 2, 1984
McDermott, James D. Jan. 4, 1914 Jul 1, 1970
Amy Chamberlain Sep. 28, 1975 09-06-95
Gross, Adam M. 01-03-77
Joseph Lisota April 9, 1964
Joseph W. Eisel Sep 3, 1990
代码:
public String[] readLines(String filename) throws IOException {
FileReader fileReader = new FileReader(filename);
BufferedReader bufferedReader = new BufferedReader(fileReader);
List<String> lines = new ArrayList<String>();
List<String> names = new ArrayList<String>();
String line = null;
String name = "";
int i;
int ind;
int indTemp;
int indTemp2;
boolean flag = false;
String[] monthsLong = {"JANUARY", "FEBRUARY", "MARCH", "APRIL", "MAY", "JUNE", "JULY", "AUGUST", "SEPTEMBER", "OCTOBER", "NOVEMBER", "DECEMBER"};
String[] monthsLongR = {" 01", "02", " 03", "04", "05", "06", "07", "08", " 09", "10", "11", "12"};
String[] monthsLow = {"JAN\\.", "FEB\\.", "MAR\\.","APR\\.", "MAY", "JUN\\.", "JUL\\.", "AUG\\.", "SEP\\.", "OCT\\.", "NOV\\.", "DEC\\."};
String[] monthsCaps = {" JAN", "FEB", " MAR", "APR", "MAY", "JUN", "JUL", "AUG", " SEP", "OCT", "NOV", "DEC"};
while ((line = bufferedReader.readLine()) != null) {
line = line.replaceAll("null", "");
line = line.replaceAll("-","/");
line = line.toUpperCase() ;
for(i = 0; i<12; i++)
{
line = line.replaceAll(monthsLong[i], monthsLongR[i]);
}
for(i = 0; i<12; i++)
{
line = line.replaceAll(monthsLow[i], monthsLongR[i]);
}
for(i = 0; i<12; i++)
{
line = line.replaceAll(monthsCaps[i], monthsLongR[i]);
}
line = line.replaceAll("\\s+", " ");
if (Character.toString(line.charAt(0)).equals(" "))
line = line.replaceFirst(" ", "");
/* name = line;
ind = name.indexOf(".");
indTemp = name.indexOf("0");
indTemp2 = name.indexOf("1");
if (ind > -1) {
System.out.println(" period");
ind = ind + 1;
flag = true;
}
if(flag == false) {
if(indTemp2 > indTemp){
ind = indTemp2 -1;
System.out.println(" 1");
}
if (indTemp > indTemp2){
ind = indTemp - 1;
System.out.println(" 2");
}
}
flag = false;
*/
// name = name.substring(0,ind);
lines.add(line);
}
bufferedReader.close();
return lines.toArray(new String[lines.size()]);
}
答案 0 :(得分:0)
好的,那么唯一的另一种方法是逐行进行并为每种不同的行格式创建规则列表。有一些重复,但有许多行与其他行非常不同。然后,在您执行操作时循环遍历这些行,并查找规则指针,以便将该规则应用于该行。
据我所知,这是最好的方法。我有这些文件的经验,如果处理不当,它们可能是一场噩梦。在完成规则的过程中,您实际上可能会找到一种可以使用的模式,通常就是这种情况。
我希望这会有所帮助。