从文件中提取docID和文档并将它们放在hashmap中

时间:2016-09-26 00:51:14

标签: java regex file hashmap

我有这样的文字:



.I 1
.T
experimental investigation of the aerodynamics of a
wing in a slipstream .
.A
brenckman,m.
.B
j. ae. scs. 25, 1958, 324.
.W
experimental investigation of the aerodynamics of a
wing in a slipstream .
  an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .
.I 2
.T
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .
.A
ting-yili
.B
department of aeronautical engineering, rensselaer polytechnic
institute
troy, n.y.
.W
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .the discussion here is restricted to two-dimensional incompressible steady flow .
.I 3
.T
the boundary layer in simple shear flow past a flat plate .
.A
m. b. glauert
.B
department of mathematics, university of manchester, manchester,
england
.W
the boundary layer in simple shear flow past a flat plate .
the boundary-layer equations are presented for steady
flow with no pressure gradient .




我需要一个java中的正则表达式,它将提供以下内容: 每当得到一个" .I 1" ,将在" .W"之后开始提供文字。结束之前" .I 2"

1 个答案:

答案 0 :(得分:1)

我认为最简单的方法是使用以下模式找到第一个匹配项:

(?<=\.I\s1\s)[\W\w]+(?=\.I\s2)

您将获得第一场比赛:

.T
experimental investigation of the aerodynamics of a
wing in a slipstream .
.A
brenckman,m.
.B
j. ae. scs. 25, 1958, 324.
.W
experimental investigation of the aerodynamics of a
wing in a slipstream .
  an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .

然后使用以下模式找到第一场比赛的第二场比赛:

(?<=\.W\s)[\W\w]+

你会得到一个结果:

experimental investigation of the aerodynamics of a
wing in a slipstream .
  an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .

在你的情况下,它可能是这样的:

public static void main(String[] args) {
    Map<String, String> hashMap = new HashMap<>();

    String text = " ... ";  // your text here

    String p1 = null, p2 = "(?<=\\.W\\s)[\\W\\w]+";
    Pattern r1 = null, r2 = null;
    Matcher m1 = null, m2 = null;

    int i = 1;
    do {
        if(i == 3) {
            p1 = "(?<=\\.I\\s"+ i +"\\s)[\\W\\w]+(?=($))";
            i++;
        } else 
            p1 = "(?<=\\.I\\s"+ i +"\\s)[\\W\\w]+(?=(\\.I\\s"+ ++i +"))";  

        r1 = Pattern.compile(p1);
        r2 = Pattern.compile(p2);

        m1 = r1.matcher(text);

        String textPart;
        if(m1.find()) {
            textPart = m1.group(0);
            m2 = r2.matcher(textPart);
            if(m2.find()) 
                hashMap.put(".I " + (i - 1), m2.group(0));              
        }    
    } while(i < 4);

    for(Map.Entry<String, String> item : hashMap.entrySet()) {
        System.out.println(item.getKey());
        System.out.println(item.getValue());
        System.out.println();
    }
}

结果:

.I 2
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .the discussion here is restricted to two-dimensional incompressible steady flow .


.I 1
experimental investigation of the aerodynamics of a
wing in a slipstream .
  an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .


.I 3
the boundary layer in simple shear flow past a flat plate .
the boundary-layer equations are presented for steady
flow with no pressure gradient .