假设我的文件结构如下:
第0行:
354858
Some String That Is Important
AA 其他STUFF SOMESTUFF 应该被忽视第1行:
543788
Another String That Is Important
AA 其他东西 应该被忽视的SOMESTUFF
依旧......
现在我想获取我的示例中标记的信息(参见灰色背景)。序列 AA 始终存在(并且可以用作中断并跳到下一行),而信息字符串的长度不同。
解析信息的最佳方法是什么?一个带有if, then, else
的缓冲读卡器,或者你可以告诉某种解析器,读取一些lenth XYZ 然后将所有内容读入一个字符串,直到找到 AA 然后跳过。
答案 0 :(得分:1)
我会逐行读取文件,并将每一行与正则表达式相匹配。我希望我在下面的代码中的评论足够详细。
// The pattern to use
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA");
// Read file line by line
BufferedReader br = new BufferedReader(new FileReader(myFile));
String line;
while((line = br.readLine()) != null) {
// Match line against our pattern
Matcher m = p.matcher(line);
if(m.find()) {
// Line is valid, process it however you want
// m.group(1) contains the number
// m.group(2) contains the text between number and AA
} else {
// Line has invalid format (pattern does not match)
}
}
我使用的正则表达式(Pattern)的说明:
^([0-9]+)\s+(([^A]|A[^A])+)AA
^ matches the start of the line
([0-9]+) matches any integral number
\s+ matches one or more whitespace characters
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A
AA matches the terminating AA
更新为评论回复:
如果每一行都有前面的|
字符,则表达式如下所示:
^\|([0-9]+)\s+(([^A]|A[^A])+)AA
在JAVA中,你需要像这样逃避它:
"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA"
角色|
在正则表达式中具有特殊含义,必须进行转义。
答案 1 :(得分:1)
如果没有更多信息,就无法告诉您哪个问题最适合您。
一种解决方案可能是
String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2);
System.out.println("split = " + Arrays.toString(split));
<强>输出强>
split = [354858, Some String That Is Important]
答案 2 :(得分:1)
您可以逐行阅读文件,并排除包含 AA charSequence
的部分:
final String charSequence = "AA";
String line;
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename")));
try {
while ((line = r.readLine()) != null) {
int pos = line.indexOf(charSequence);
if (pos > 0) {
String myImportantStuff = line.substring(0, pos);
//do something with your useful string
}
}
} finally {
r.close();
}
答案 3 :(得分:0)
使用正则表达式:.+?(?=AA)
。
答案 4 :(得分:0)
以下是您的解决方案:
public static void main(String[] args) {
InputStream source; //select a text source (should be a FileInputStream)
{
String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" +
"543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
}
try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) {
Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$");
while(true) {
String line = stream.readLine();
if(line == null) {
break;
}
Matcher matcher = pattern.matcher(line);
if(matcher.matches()) {
String someNumber = matcher.group(1);
String someText = matcher.group(2);
//do something with someNumber and someText
} else {
throw new ParseException(line, 0);
}
}
} catch (IOException | ParseException e) {
e.printStackTrace(); // TODO ...
}
}
答案 5 :(得分:0)
您可以使用正则表达式,但如果您知道每一行都包含AA
并且您希望内容最多为AA
,那么您可以简单地执行substring(int,int)
来获取排队到AA
public List read(Path path) throws IOException {
return Files.lines(path)
.map(this::parseLine)
.collect(Collectors.toList());
}
public String parseLine(String line){
int index = line.indexOf("AA");
return line.substring(0,index);
}
这是{Java}版read
public List read(Path path) throws IOException {
List<String> content = new ArrayList<>();
try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){
String line;
while((line = reader.readLine()) != null){
content.add(parseLine(line));
}
}
return content;
}