Lesson no 1
lesson name: Jack and Jill went to America
lesson contents: some XXXXX XXXXX contents
lesson Description: jack and jill lesson description
Lesson no 2
lesson name: Lorem ipsum dolor sit amet
lesson contents: consectetur adipisicing elit, sed do eiusmod tempor
lesson Description: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
我有一个类似上面内容的文件,我想使用正则表达式来处理并使用java转换为json对象。 任何人都可以建议正则表达式来处理和分离“课程名称”,“课程内容”,课程描述“等等?
我希望输出看起来像这样:
[{"Lesson no":"1","lesson name":"xxx","lesson contents":"YYY","Lesson Desc":"zzzz"},{....}]
答案 0 :(得分:1)
要使用正则表达式,您必须确保文件的常量结构。在这里,我将课程分为2行,包括在最后一课后。您可以在读取文件后以编程方式附加这些行,或者在它们之间仅使用1个空行等。
Lesson no 1
lesson name: Jack and Jill went to America
lesson contents: some XXXXX XXXXX contents with new
lines
lesson Description: jack and Jill lesson description with new
lines
Lesson no 2
lesson name: Lorem ipsum dolor sit amet
lesson contents: consectetur adipisicing elit, sed do eiusmod tempor
lesson Description: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
<this is an empty line>
<this is an empty line>
代码重复匹配单个课程的结构并将其分解为组件。如果您的输入文件发生更改,则需要相应地更改pattern
变量。
注意:在Java 8上,您根本不需要lb
字符串,请将其替换为&#34; \\ R&#34;。
public class Regex {
static String lb = System.getProperty("line.separator");
static String path = "src/test/text.txt";
static String pattern = "(Lesson no) (.+?)"+lb+"(lesson name): (.+?)"+lb+"(lesson contents): (.+?)"+lb+"(lesson Description): (.+?)"+lb+lb;
public static void main(String[] args) {
String text = null;
try {
text = new Scanner(new File(path)).useDelimiter("\\z").next();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Pattern pat = Pattern.compile(pattern, Pattern.DOTALL);
Matcher m = pat.matcher(text);
StringBuilder sb = new StringBuilder("[");
while (m.find()) {
sb.append("{");
for (int i = 1; i <= m.groupCount(); i++) {
sb.append("\"").append(m.group(i));
if (i%2 == 0)
sb.append("\",");
else
sb.append("\":");
}
sb.deleteCharAt(sb.length()-1).append("},");
}
sb.deleteCharAt(sb.length()-1).append("]");
System.out.println(sb.toString());
}
}
输出
[{"Lesson no":"1","lesson name":"Jack and Jill went to America","lesson contents":"some XXXXX XXXXX contents with new
lines","lesson Description":"jack and Jill lesson description with new
lines"},{"Lesson no":"2","lesson name":"Lorem ipsum dolor sit amet","lesson contents":"consectetur adipisicing elit, sed do eiusmod tempor","lesson Description":" Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor"}]
答案 1 :(得分:0)
让我们从搜索课程名称开始。更具体地说,我们希望找到紧跟在&#34;课程名称:&#34;之后但在换行符之前的内容。这是一个正则表达式:
lesson name:\s*(.*)
如您所见,它会搜索&#34;课程名称:&#34;。然后 \ s 匹配任何空格字符,而星形意味着0次或更多次。这很好,因为如果你在#34;课程名称之后不小心没有空格或者可能有两个空格:&#34;它仍会抓住它。
最后,该点匹配任何字符,除了NEWLINE。星形意味着0次或更多次,所以我们匹配线的其余部分。因为该部分在括号中,所以它存储在Java中的变量中。
我对Java不太熟悉,但我认为你必须得到这样的匹配(如果有人看到错误请告诉我)....
Pattern regex = Pattern.compile('lesson name:\s*(.*)');
Matcher m = regex.matcher(yourfilestring);
if(m.find()){
System.out.println(m.group(1)); // 1 is for the first set of parenthesis in the regex
}
您可以应用此概念来完成以下事项&#34;课程编号&#34;,&#34;课程内容:&#34;和&#34;课程描述:&#34;太
答案 2 :(得分:0)
试试此代码
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.StringTokenizer;
public class test {
/**
* @param args
*/
public static void main(String[] args) {
try {
Scanner scanner = new Scanner(new File("test.txt"));
while (scanner.hasNext()){
String string = scanner.useDelimiter("\n").next();
if(string.contains("Lesson no")){
System.out.print(string );
}
else{
if(!string.equals(" ") && !string.equals("\n") && !string.equals("") && !string.equals("\r")){
if(string.contains(":")){
StringTokenizer st = new StringTokenizer(string,":");
String key ="";
String value = "";
while (st.hasMoreElements()) {
key = st.nextElement().toString();
value = st.nextElement().toString();
System.out.print(" " +key +" : "+ value);
}
}
else{
System.out.println(string);
}
}//if(!string.equals(" "))
}
}//while
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
我希望可以提供帮助