嗨我有一个字符串,我想用正则表达式或任何方法打破
我的字符串是
1 Agra Achhnera NIL
2 Agra Agra NIL
3 Agra Fatehabad NIL
4 Agra Fatehpur Sikri NIL
5 Aligarh Aligarh 1300.00
6 Aligarh Khair 1300.00
7 Ambedkar Nagar Akbarpur NIL
8 Ambedkar Nagar Tanda Akbarpur 1478.00
结果我想要这样的字符串: -
1 Agra Achhnera NIL
2 Agra Agra NIL
3 Agra Fatehabad NIL
4 Agra FatehpurSikri NIL
5 Aligarh Aligarh 1300.00
6 Aligarh Khair 1300.00
7 AmbedkarNagar Akbarpur NIL
18 AmbedkarNagar TandaAkbarpur 1478.00
我怎样才能实现这个目标
我的java代码
<%@page import="com.gargoylesoftware.htmlunit.BrowserVersion;"%>
<%@page import="org.openqa.selenium.By;"%>
<%@page import="org.openqa.selenium.WebDriver;"%>
<%@page import="org.openqa.selenium.WebElement"%>
<%@page import="org.openqa.selenium.firefox.FirefoxDriver"%>
<%@page import="org.openqa.selenium.htmlunit.HtmlUnitDriver"%>
<%@page import="org.openqa.selenium.support.ui.Select"%>
<%
WebDriver driver = new HtmlUnitDriver(BrowserVersion.getDefault());
String sDate = "27/03/2014";
String url="http://www.upmandiparishad.in/commodityWiseAll.aspx";
driver.get(url);
Thread.sleep(5000);
new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo");
driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate);
Thread.sleep(3000);
driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click();
Thread.sleep(5000);
WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1"));
String htmlTableText = findElement.getText();
// do whatever you want now, This is raw table values.
htmlTableText=htmlTableText.replace("S.No.DistrictMarketPrice","");
htmlTableText= htmlTableText.replaceAll("\\s(\\d+\\s[A-Z])", "<br>$1");
//System.out.println(htmlTableText);
String data[]=htmlTableText.split("");
out.println(data[9]);
driver.close();
driver.quit();
%>
提前致谢
答案 0 :(得分:0)
这将适用于您的特定情况,因为您正在抓取的数据采用此格式,并且以下正则表达式在大多数情况下都适用。我并不是说这是完美的答案,但我应该告诉你,这会让你继续前进:
查找
[a-z]\s[A-Z]
并替换为\1\2