我有一项任务,要求我在txt文件中使用下面的数据。没有指定的分隔符,它可以使我更容易排序到数组列表中。我可以使用Scanner
类来读取文本文件并将其排序为如下数组:
for (int rows; rows < array.length; rows++){
array[rows][0] = fileIn.next();
array[rows][1] = fileIn.next();
依此类推......然而,这些名字有点难,因为它们中有不同数量的空格,并且可能有不同数量的名称。我希望将“Allison,Hudson J C夫人(Bessie Waldo Daniels)”这个全名作为自己的元素。我不确定从哪里开始,但我认为一个解决方案是让程序检查是否存在“male”||“female”以便我们可以启动一个新元素。任何帮助将不胜感激。
1 1 Allen, Miss. Elisabeth Walton female 29 211.3375
1 1 Allison, Master. Hudson Trevor male 0.9167 151.5500
1 0 Allison, Miss. Helen Loraine female 2 151.5500
1 0 Allison, Mr. Hudson Joshua Creighton male 30 151.5500
1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25 151.5500
1 1 Anderson, Mr. Harry male 48 26.5500
1 1 Andrews, Miss. Kornelia Theodosia female 63 77.9583
1 0 Andrews, Mr. Thomas Jr male 39 0.0000
1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53 51.4792
1 0 Artagaveytia, Mr. Ramon male 71 49.5042
1 0 Astor, Col. John Jacob male 47 227.5250
1 1 Astor, Mrs. John Jacob (Madeleine Talmadge Force) female 18 227.5250
1 1 Aubart, Mme. Leontine Pauline female 24 69.3000
答案 0 :(得分:2)
这非常适合正则表达式 - 请参阅here以获取数据示例。
([\d]) +([\d]) +(.+\S) +(female|male) +([\d.]+) +([\d.]+)
Here Java中repl.it的完整示例
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main( String args[] ){
String text =
"1 1 Allen, Miss. Elisabeth Walton female 29 211.3375\n"+
"1 1 Allison, Master. Hudson Trevor male 0.9167 151.5500\n"+
"1 0 Allison, Miss. Helen Loraine female 2 151.5500\n"+
"1 0 Allison, Mr. Hudson Joshua Creighton male 30 151.5500\n"+
"1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25 151.5500\n"+
"1 1 Anderson, Mr. Harry male 48 26.5500\n"+
"1 1 Andrews, Miss. Kornelia Theodosia female 63 77.9583\n"+
"1 0 Andrews, Mr. Thomas Jr male 39 0.0000\n"+
"1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53 51.4792\n"+
"1 0 Artagaveytia, Mr. Ramon male 71 49.5042\n"+
"1 0 Astor, Col. John Jacob male 47 227.5250\n"+
"1 1 Astor, Mrs. John Jacob (Madeleine Talmadge Force) female 18 227.5250\n"+
"1 1 Aubart, Mme. Leontine Pauline female 24 69.3000\n";
String lines[] = text.split("\\r?\\n");
String pattern = "([\\d]) +([\\d]) +(.+\\S) +(female|male) +([\\d.]+) +([\\d.]+)";
Pattern r = Pattern.compile(pattern);
for (String l : lines) {
Matcher m = r.matcher(l);
if (m.find( )) {
System.out.println(" ------------------- New Text Line -------------------");
System.out.println("Group 1: " + m.group(1) );
System.out.println("Group 2: " + m.group(2) );
System.out.println("Group 3: " + m.group(3) );
System.out.println("Group 4: " + m.group(4) );
System.out.println("Group 5: " + m.group(5) );
System.out.println("Group 6: " + m.group(6) );
} else {
System.out.println("Line did not match");
}
}
}
}
会产生类似的输出
------------------- New Text Line -------------------
Group 1: 1
Group 2: 1
Group 3: Allen, Miss. Elisabeth Walton
Group 4: female
Group 5: 29
Group 6: 211.3375
------------------- New Text Line -------------------
Group 1: 1
Group 2: 1
Group 3: Allison, Master. Hudson Trevor
Group 4: male
Group 5: 0.9167
Group 6: 151.5500
------------------- New Text Line -------------------
Group 1: 1
Group 2: 0
Group 3: Allison, Miss. Helen Loraine
Group 4: female
Group 5: 2
Group 6: 151.5500
答案 1 :(得分:0)
我同意你自己的建议。您可以使用正则表达式来帮助解析最初的两个数字和&#34;男性|女性&#34;之间的所有内容。
您的代码可能类似于:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class test {
private String[] parseLine(String line) {
String[] output = new String[6];
Pattern nonWhitespace = Pattern.compile("\\S+");
Pattern sex = Pattern.compile( "\\s*(male|female)" );
Matcher m = sex.matcher( line );
if ( ! m.find() ) {
// Handle errors. Couldn't find "male" or "female"
}
String firstHalf = line.substring(0, m.start());
String lastHalf = line.substring(m.start(), INPUT.length());
Matcher firstHalfTokenizer = nonWhitespace.matcher(firstHalf);
if ( ! firstHalfTokenizer.find() ) {
// Handle errors. Couldn't find any non-whitespace characters
}
output[0] = firstHalf.substring(firstHalfTokenizer.start(), firstHalfTokenizer.end()).trim();
if ( ! firstHalfTokenizer.find() ) {
// Handle errors. Couldn't find a second non-whitespace token
}
output[1] = firstHalf.substring(firstHalfTokenizer.start(), firstHalfTokenizer.end()).trim();
output[2] = firstHalf.substring(firstHalfTokenizer.end(), firstHalf.length()).trim();
Matcher lastHalfTokenizer = nonWhitespace.matcher(lastHalf);
int index = 3;
// Need to catch index-out-of-bounds errors if file has too many columns
while( lastHalfTokenizer.find() ) {
output[ index ] = lastHalf.substring(lastHalfTokenizer.start(), lastHalfTokenizer.end()).trim();
index++;
}
return output;
}
public static void main(String[] args) {
List<String[]> array = new ArrayList<String[]>();
for ( String line in file ) { //XXX: Replace this with actual code to loop through the file
array.add( parseLine(line) );
}
// Do whatever you want with it
}
}