在Java中的标签之间提取字符串

时间:2018-06-25 11:47:54

标签: java string parsing text-extraction

我有下面这样的字符串

Msg_Begin
Some message1
Msg_End
Msg_Begin
Some message2
Msg_End
Msg_Begin
Some message3
Msg_End

并希望获取列表中 Msg_Begin Msg_End 之间的消息 像

[Some message1, Some message2, Some message3]

在Java中什么是最好的方法?

2 个答案:

答案 0 :(得分:5)

container

只需确保您的邮件中不包含<div class="container"> <div class="big-div" style="width: 60%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> <div class="small-div" style="width: 20%;"></div> </div> String messages = originalString.replaceAll("Msg_Begin",""); String[] array = messages.split("Msg_End"); return Arrays.asList(array);

答案 1 :(得分:2)

您可以使用正则表达式来实现:

df['New']=df.groupby(['ID','Agent','OV']).cumcount()+1
new_df=df.set_index(['ID','Agent','OV','New']).unstack('New').sort_index(axis=1 , level=1)
new_df.columns=new_df.columns.map('{0[0]}{0[1]}'.format) 
new_df
Out[40]: 
              Zone1  Value1   PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
ID Agent OV                                                               
1  10.0  26.0    M1    10.0  100.0  None     NaN   NaN  None     NaN   NaN
2  26.5  8.0     M2    50.0   95.0    M1     6.0   5.0  None     NaN   NaN
3  4.5   6.0     M3     4.0   40.0    M4     6.0  60.0  None     NaN   NaN
4  1.2   0.8     M1     8.0  100.0  None     NaN   NaN  None     NaN   NaN
5  2.0   0.4     M1     6.0   10.0    M2    41.0  86.0    M4     2.0   4.0

产生以下结果:

//Filling Your test case and print
String entry = "Msg_Begin\r\n" + 
               "Some message1\r\n" + 
               "Msg_End\r\n" + 
               "Msg_Begin\r\n" + 
               "Some message2\r\n" + 
               "Msg_End\r\n" + 
               "Msg_Begin\r\n" + 
               "Some message3\r\n" + 
               "Msg_End";

System.out.println("IN : \r\n" + entry) ;

//Compile the regular expression patern, providing the DOTALL flag to enable mutiline matches
Pattern p = Pattern.compile("Msg_Begin\r\n(.+?)\r\nMsg_End(\r\n)?", Pattern.DOTALL) ;  
Matcher m = p.matcher(entry) ; 

// iterate over results (for exemple add them to a list)
System.out.println("\r\nOUT :") ;
List<String> list = new ArrayList<>();
while (m.find()) {
    list.add( m.group(1));
    System.out.println(m.group(1)) ;
}

更多有关正则表达式语法的信息,here