有像是,不是,不包含的单词组合。我们必须在句子中匹配这些单词,并且必须将其拆分。
输入:if name is tom and age is not 45 or name does not contain tom then let me know.
预期输出:
If name is
tom and age is not
45 or name does not contain
tom then let me know
我尝试下面的代码进行拆分和提取但是“is”的出现在“is not”中,我的代码无法找到:
public static void loadOperators(){
operators.add("is");
operators.add("is not");
operators.add("does not contain");
}
public static void main(String[] args) {
loadOperators();
for(String s : operators){
System.out.println(str.split(s).length - 1);
}
}
答案 0 :(得分:0)
由于单词split
可能会出现多次,因此无法解决您的使用案例,例如is
和is not
是您的不同运算符。理想情况下,你会:
Iterate :
1. Find the index of the 'operator'.
2. Search for the next space _ or word.
3. Then update your string as substring from its index to length-1.
答案 1 :(得分:0)
我不完全确定你想要达到的目标,但让我们试一试。
对于您的情况,一个简单的"解决方法"可能工作得很好:
按运算符的长度对运算符进行排序。这种方式最大的匹配"将首先找到。你可以定义最大的"或者字面上是最长的字符串,或者最好是单词的数量(包含的空格数),因此is a
优先于contains
你需要确保没有匹配重叠,这可以通过比较所有匹配来完成。开始和结束指数并通过某些标准丢弃重叠,例如第一场比赛胜利
答案 2 :(得分:0)
此代码执行您似乎想做的事情(或我猜想您想要做的事情):
public static void main(String[] args) {
List<String> operators = new ArrayList<>();
operators.add("is");
operators.add("is not");
operators.add("does not contain");
String input = "if name is tom and age is not 45 or name does not contain tom then let me know.";
List<String> output = new ArrayList<>();
int lastFoundOperatorsEndIndex = 0; // First start at the beginning of input
for (String operator : operators){
int indexOfOperator = input.indexOf(operator); // Find current operator's position
if (indexOfOperator > -1) { // If operator was found
int thisOperatorsEndIndex = indexOfOperator + operator.length(); // Get length of operator and add it to the index to include operator
output.add(input.substring(lastFoundOperatorsEndIndex, thisOperatorsEndIndex).trim()); // Add operator to output (and remove trailing space)
lastFoundOperatorsEndIndex = thisOperatorsEndIndex; // Update startindex for next operator
}
}
output.add(input.substring(lastFoundOperatorsEndIndex, input.length()).trim()); // Add rest of input as last entry to output
for (String part : output) { // Output to console
System.out.println(part);
}
}
但它高度依赖于句子和运算符的顺序。如果我们谈论用户输入,那么任务将更多更复杂。
使用正则表达式(regExp)的更好方法是:
public static void main(String... args) {
// Define inputs
String input1 = "if name is tom and age is not 45 or name does not contain tom then let me know.";
String input2 = "the name is tom and he is 22 years old but the name does not contain jack, but merry is 24 year old.";
// Output split strings
for (String part : split(input1)) {
System.out.println(part.trim());
}
System.out.println();
for (String part : split(input2)) {
System.out.println(part.trim());
}
}
private static String[] split(String input) {
// Define list of operators - 'is not' has to precede 'is'!!
String[] operators = { "\\sis not\\s", "\\sis\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
// Concatenate operators to regExp-String for search
StringBuilder searchString = new StringBuilder();
for (String operator : operators) {
if (searchString.length() > 0) {
searchString.append("|");
}
searchString.append(operator);
}
// Replace all operators by operator+\n and split resulting string at \n-character
return input.replaceAll("(" + searchString.toString() + ")", "$1\n").split("\n");
}
注意操作员的顺序! '是'必须来 '不'或'不'将永远分裂。
您可以通过对运算符'is'使用否定前瞻来防止这种情况发生。
因此"\\sis\\s"
将成为"\\sis(?! not)\\s"
(读起来像:“是”,而不是“不是”)。
极简主义版本(JDK 1.6+)可能如下所示:
private static String[] split(String input) {
String[] operators = { "\\sis(?! not)\\s", "\\sis not\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
return input.replaceAll("(" + String.join("|", operators) + ")", "$1\n").split("\n");
}