我有这些字符串;
wordsExpanded="test | is | [(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}] | test | [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] | [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]"
interpretation="{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}"
我需要输出的是这样的字符串;
finalOutput="test | is | thirty four | test | 3 | 1 "
基本上,解释字符串具有确定使用哪个组所需的信息。 对于第一个,我们使用,因此正确的字符串是“(三十四)”而不是“(3 4)” 第二个是“(3)”,然后是“(1)”
到目前为止,这是我的代码;
package com.test.prova;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Prova {
public static void main(String[] args) {
String nlInterpretation="{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}";
String inputText="this is 34 test 3 1";
String grammar="test is [(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}] test [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]";
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"'\\[]+|\\[([^\\]]*)\\]|'([^']*)'");
Matcher regexMatcher = regex.matcher(grammar);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
matchList.add(regexMatcher.group(2));
} else {
matchList.add(regexMatcher.group());
}
}
String[] xx = matchList.toArray(new String[0]);
String[] yy = inputText.split(" ");
matchList = new ArrayList<String>();
regex = Pattern.compile("[^<]+|<([^>]*)>");
regexMatcher = regex.matcher(nlInterpretation);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
matchList.add(regexMatcher.group(1));
}
}
String[] zz = matchList.toArray(new String[0]);
System.out.println(String.join(" | ",zz));
for (int i=0; i<xx.length; i++) {
if (xx[i].contains("number_type_")) {
matchList = new ArrayList<String>();
regex = Pattern.compile("[^\\(]+|<([^\\)]*)>.*[^<]+|<([^>]*)>");
regexMatcher = regex.matcher(xx[i]);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
matchList.add(regexMatcher.group(2));
} else {
matchList.add(regexMatcher.group());
}
}
System.out.println(String.join(" | ",matchList.toArray(new String[0])));
}
System.out.printf("%02d\t%s\t->%s\n", i, yy[i], xx[i]);
}
}
}
生成的输出如下;
number_type_2 digits | number_type_1 digits | number_type_0 words
00 this ->test
01 is ->is
thirty four) {<number_type_0 words>} | 3 4 ) {<number_type_0 digits>}
02 34 ->(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}
03 test ->test
three) {<number_type_1 words>} | 3 ) {<number_type_1 digits>}
04 3 ->(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}
one) {<number_type_2 words>} | 1 ) {<number_type_2 digits>}
05 1 ->(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}
我想要的更像是这样;
number_type_2 digits | number_type_1 digits | number_type_0 words
00 this ->test
01 is ->is
02 34 ->thirty four
03 test ->test
04 3 ->3
05 1 ->1
答案 0 :(得分:0)
我正在编写一个解决方案,假设您的字符串interpretation
的格式保持不变,即{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}
,并且它不会更改。
我将描述 Java 7 和 Java 8 方法。我明确表示我的算法在指数时间中运行,这是一种直接的天真方法。我想不出在短时间内更快的事情。
让我们开始浏览代码:
Java-7风格
/*
* STEP 1: Create a method that accepts wordsExpanded and
* interpretation Strings
*/
public static void parseString(String wordsExpanded, String interoperation) {
/*
* STEP 2: Remove leading and tailing curly braces form
* interoperation String
*/
interoperation= interoperation.replaceAll("\\{", "");
interoperation = interoperation.replaceAll("\\}", "");
/*
* STEP 3: Split your interoperation String at '>'
* because we need individual interoperations like
* "<number_type_2 words" to compare.
*/
String[] allInterpretations = interoperation.split(">");
/*
* STEP 4: Split your wordsExpanded String at '|'
* to get each word.
*/
String[] allWordsExpanded = wordsExpanded.split("\\|");
/*
* STEP 5: Create a resultant StringBuilder
*/
StringBuilder resultBuilder = new StringBuilder();
/*
* STEP 6: Iterate over each words form wordsExpanded
* after splitting.
*/
for(String eachWordExpanded : allWordsExpanded){
/*
* STEP 7: Remove leading and tailing spaces
*/
eachWordExpanded = eachWordExpanded.trim();
/*
* STEP 8: Remove leading and tailing curly braces
*/
eachWordExpanded = eachWordExpanded.replaceAll("\\{", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\}", "");
/*
* STEP 9: Now, iterate over each interoperation.
*/
for(String eachInteroperation : allInterpretations){
/*
* STEP 10: Remove the leading and tailing spaces
* from each interoperations.
*/
eachInteroperation = eachInteroperation.trim();
/*
* STEP 11: Now append '>' to end of each interoperation
* because we'd split each of them at '>' previously.
*/
eachInteroperation = eachInteroperation + ">";
/*
* STEP 12: Check if each eordExpanded contains any of the
* interoperation.
*/
if(eachWordExpanded.contains(eachInteroperation)){
/*
* STEP 13: If each interoperation contains
* 'word', goto STEP 14.
* ELSE goto STEP 18.
*/
if(eachInteroperation.contains("words")){
/*
* STEP 14: Remove that interoperation from the
* each wordExpanded String.
*
* Ex: if the interoperation is <number_type_2 words>
* and it is found in the wordExpanded, remove it.
*/
eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");
/*
* STEP 15: Now change the interoperation to digits.
* Ex: IF the interoperation is <number_type_2 words>,
* change that to <number_type_2 digits> and also remove them.
*/
eachInteroperation = eachInteroperation.replaceAll("words", "digits");
eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");
/*
* STEP 16: Remove leading and tailing square braces
*/
eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");
/*
* STEP 17: Remove any numbers in the form ( 3 ),
* since we are dealing with words.
*/
eachWordExpanded = eachWordExpanded.replaceAll("[(0-9)+]", "");
eachWordExpanded = eachWordExpanded.replaceAll("(\\s)+", " ");
}else{
/*
* STEP 18: Remove the interoperation just like STEP 14.
*/
eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");
/*
* STEP 19: Now, change interoperations to words just like STEP 15,
* since we are dealing with digits here and then, remove it from the
* each wordExpanded String.
*/
eachInteroperation = eachInteroperation.replaceAll("digits", "words");
eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");
/*
* STEP 20: Remove the leading and tailing square braces.
*/
eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");
/*
* STEP 21: Remove the words in the form '(thirty four)'
*/
eachWordExpanded = eachWordExpanded.replaceAll("[(A-Za-z)+]", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\s", "");
}
}else{
continue;
}
}
/*
* STEP 22: Build your result object
*/
resultBuilder.append(eachWordExpanded + "|");
}
/*
* FINAL RESULT
*/
System.out.println(resultBuilder.toString());
}
等效的 Java-8 样式如下:
public static void parseString(String wordsExpanded, String interoperation) {
interoperation= interoperation.replaceAll("\\{", "");
interoperation = interoperation.replaceAll("\\}", "");
String[] allInterpretations = interoperation.split(">");
StringJoiner joiner = new StringJoiner("");
Set<String> allInterOperations = Arrays.asList(interoperation.split(">"))
.stream()
.map(eachInterOperation -> {
eachInterOperation = eachInterOperation.trim();
eachInterOperation = eachInterOperation + ">";
return eachInterOperation;
}).collect(Collectors.toSet());
String result = Arrays.asList(wordsExpanded.split("\\|"))
.stream()
.map(eachWordExpanded -> {
eachWordExpanded = eachWordExpanded.trim();
eachWordExpanded = eachWordExpanded.replaceAll("\\{", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\}", "");
for(String eachInterOperation : allInterOperations){
if(eachWordExpanded.contains(eachInterOperation)){
if(eachInterOperation.contains("words")){
eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
eachInterOperation = eachInterOperation.replaceAll("words", "digits");
eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");
eachWordExpanded = eachWordExpanded.replaceAll("[(0-9)+]", "");
eachWordExpanded = eachWordExpanded.replaceAll("(\\s)+", " ");
}else{
eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
eachInterOperation = eachInterOperation.replaceAll("digits", "words");
eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");
eachWordExpanded = eachWordExpanded.replaceAll("[(A-Za-z)+]", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\s", "");
}
}else{
continue;
}
}
return eachWordExpanded;
}).collect(Collectors.joining("|"));
System.out.println(result);
}
使用不同的互操作字符串对上述方法运行以下测试:
{<number_type_2 words> <number_type_1 words> <number_type_0 words>}
{<number_type_2 digits> <number_type_1 words> <number_type_0 words>}
{<number_type_2 digits> <number_type_1 digits> <number_type_0 digits>}
{<number_type_2 words> <number_type_1 digits> <number_type_0 digits>}
会产生(Java-7 Result):
的结果test|is|thirty four |test|three |one |
test|is|thirty four |test|three |1|
test|is|34|test|3|1|
test|is|34|test|3|one |
(Java-8结果)
test|is|thirty four|test|three|one
test|is|thirty four|test|three|1
test|is|34|test|3|1
test|is|34|test|3|one
我希望这是你想要实现的目标。
答案 1 :(得分:0)
谢谢你们, 根据Shyam的代码,我做了一些修改,使它完全返回我需要的内容。
这是我的新代码;
public static String parseString(String grammar, String interoperation) {
if (grammar==null || interoperation == null || interoperation.equals("{}"))
return null;
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"'\\[]+|\\[([^\\]]*)\\]|'([^']*)'");
Matcher regexMatcher = regex.matcher(grammar);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
matchList.add(regexMatcher.group(2));
} else {
matchList.add(regexMatcher.group());
}
}
String[] xx = matchList.toArray(new String[0]);
String wordsExpanded = String.join(" | ",xx);
interoperation= interoperation.replaceAll("\\{", "")
.replaceAll("\\}", "");
Set<String> allInterOperations = Arrays.asList(interoperation.split(">"))
.stream()
.map(eachInterOperation -> {
eachInterOperation = eachInterOperation.trim();
eachInterOperation = eachInterOperation + ">";
return eachInterOperation;
}).collect(Collectors.toSet());
String result = Arrays.asList(wordsExpanded.split("\\|"))
.stream()
.map(eachWordExpanded -> {
eachWordExpanded = eachWordExpanded.trim();
eachWordExpanded = eachWordExpanded.replaceAll("\\{", "");
eachWordExpanded = eachWordExpanded.replaceAll("\\}", "");
for(String eachInterOperation : allInterOperations){
if(eachWordExpanded.contains(eachInterOperation)){
Pattern pattern = Pattern.compile("(\\(.*?\\))\\s*(<.*?>)");
Matcher matcher = pattern.matcher(eachWordExpanded);
while (matcher.find()) {
if (matcher.group(2).equals(eachInterOperation))
eachWordExpanded = matcher.group(1).replaceAll("[\\(\\)]", "").trim();
}
}else{
continue;
}
}
return eachWordExpanded;
}).collect(Collectors.joining("|"));
return result;
}
}
输出如下;
输入:
interoperation="{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}";
grammar="test is [(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}] test [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]";
测试|是|三十四|测试| 3 | 1
输入:
grammar="test is [(thirty four) {<number_type_0 words>}( three four ) {<number_type_0 digits>}] test [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]";
测试|是|三十四|测试| 3 | 1
输入:
interoperation="{<number_type_4 digits> <number_type_3 digits> <number_type_2 words> <number_type_1 words> <number_type_0 words>}";
grammar="test [(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}] test [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]";
测试|三十四|测试|三|一
输入:
grammar = "this is my test [(three hundred forty one) {<number_type_0 words>}( 3 4 1 ) {<number_type_0 digits>}] for [(twenty one) {<number_type_1 words>}( 2 1 ) {<number_type_1 digits>}] issues";
interoperation= "{<number_type_1 digits> <number_type_0 words>}";
这|是| my | test |三百四十一|为| 2 1 |问题