例:
如果有一句话:
My name is not eugene. my pet name is not eugene.
我们必须搜索包含给定单词的句子中的最小部分
我的和 eugene
那么答案就是
eugene. my
。
无需检查大写或小写或特殊字符或数字。
我粘贴了我的代码但是对于某些测试用例得到了错误的答案。
任何人都可以知道代码有什么问题。我没有错误的测试用例。
import java.io.*;
import java.util.*;
public class ShortestSegment
{
static String[] pas;
static String[] words;
static int k,st,en,fst,fen,match,d;
static boolean found=false;
static int[] loc;
static boolean[] matches ;
public static void main(String s[]) throws IOException
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
pas = in.readLine().replaceAll("[^A-Za-z ]", "").split(" ");
k = Integer.parseInt(in.readLine());
words = new String[k];
matches = new boolean[k];
loc = new int[k];
for(int i=0;i<k;i++)
{
words[i] = in.readLine();
}
en = fen = pas.length;
find(0);
if(found==false)
System.out.println("NO SUBSEGMENT FOUND");
else
{
for(int j=fst;j<=fen;j++)
System.out.print(pas[j]+" ");
}
}
private static void find(int min)
{
if(min==pas.length)
return;
for(int i=0;i<k;i++)
{
if(pas[min].equalsIgnoreCase(words[i]))
{
if(matches[i]==false)
{
loc[i]=min;
matches[i] =true;
match++;
}
else
{
loc[i]=min;
}
if(match==k)
{
en=min;
st = min();
found=true;
if((fen-fst)>(en-st))
{
fen=en;
fst=st;
}
match--;
matches[getIdx()]=false;
}
}
}
find(min+1);
}
private static int getIdx()
{
for(int i=0;i<k;i++)
{
if(words[i].equalsIgnoreCase(pas[st]))
return i;
}
return -1;
}
private static int min()
{
int min=loc[0];
for(int i=1;i<loc.length;i++)
if(min>loc[i])
min=loc[i];
return min;
}
}
答案 0 :(得分:0)
您提供的代码将为以下输入生成错误的输出。我想,当你想要“找到含有给定单词的最短句子”时,单词长度也很重要
字符串:'我的名字是eugene。我的fn是eugene。'
搜索字符串数量:2
string1:'我的' string2:'是'
你的解决方案是:'我的名字是' 正确答案是:'我的fn是'
代码中的问题是,它将'firstname'和'fn'视为相同的长度。在比较(fen-fst)>(en-st)
中,您只考虑单词的数量是否已最小化,而不是单词长度是否缩短。
答案 1 :(得分:0)
以下代码(junit):
@Test
public void testIt() {
final String s = "My name is not eugene. my pet name is not eugene.";
final String tmp = s.toLowerCase().replaceAll("[^a-zA-Z]", " ");//here we need the placeholder (blank)
final String w1 = "my "; // leave a blank at the end to avoid those words e.g. "myself", "myth"..
final String w2 = "eugene ";//same as above
final List<Integer> l1 = getList(tmp, w1); //indexes list
final List<Integer> l2 = getList(tmp, w2);
int min = Integer.MAX_VALUE;
final int[] idx = new int[] { 0, 0 };
//loop to find out the result
for (final int i : l1) {
for (final int j : l2) {
if (Math.abs(j - i) < min) {
final int x = j - i;
min = Math.abs(j - i);
idx[0] = j - i > 0 ? i : j;
idx[1] = j - i > 0 ? j + w2.length() + 2 : i + w1.length() + 2;
}
}
}
System.out.println("indexes: " + Arrays.toString(idx));
System.out.println("result: " + s.substring(idx[0], idx[1]));
}
private List<Integer> getList(final String input, final String search) {
String t = new String(input);
final List<Integer> list = new ArrayList<Integer>();
int tmp = 0;
while (t.length() > 0) {
final int x = t.indexOf(search);
if (x < 0 || x > t.length()) {
break;
}
tmp += x;
list.add(tmp);
t = t.substring(search.length() + x);
}
return list;
}
给出输出:
indexes: [15, 25]
result: eugene. my
我认为带内联注释的代码非常容易理解。基本上,用index + wordlength播放。
注意强>
答案 2 :(得分:0)
我认为可以用另一种方式处理: 首先,找到匹配的结果,并最小化当前结果的绑定,然后从当前结果中找到匹配的结果。它可以编码如下:
/**This method intends to check the shortest interval between two words
* @param s : the string to be processed at
* @param first : one of the words
* @param second : one of the words
*/
public static void getShortestInterval(String s , String first , String second)
{
String situationOne = first + "(.*?)" + second;
String situationTwo = second + "(.*?)" + first;
Pattern patternOne = Pattern.compile(situationOne,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
Pattern patternTwo = Pattern.compile(situationTwo,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
List<Integer> result = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
/**first , test the first choice*/
Matcher matcherOne = patternOne.matcher(s);
findTheMax(first.length(),matcherOne, result);
/**then , test the second choice*/
Matcher matcherTwo = patternTwo.matcher(s);
findTheMax(second.length(),matcherTwo,result);
if(result.get(0)!=Integer.MAX_VALUE)
{
System.out.println("The shortest length is " + result.get(0));
System.out.println("Which start @ " + result.get(1));
System.out.println("And end @ " + result.get(2));
}else
System.out.println("No matching result is found!");
}
private static void findTheMax(int headLength , Matcher matcher , List<Integer> result)
{
int length = result.get(0);
int startIndex = result.get(1);
int endIndex = result.get(2);
while(matcher.find())
{
int temp = matcher.group(1).length();
int start = matcher.start();
List<Integer> minimize = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
System.out.println(matcher.group().substring(headLength));
findTheMax(headLength, matcher.pattern().matcher(matcher.group().substring(headLength)), minimize);
if(minimize.get(0) != Integer.MAX_VALUE)
{
start = start + minimize.get(1) + headLength;
temp = minimize.get(0);
}
if(temp<length)
{
length = temp;
startIndex = start;
endIndex = matcher.end();
}
}
result.set(0, length);
result.set(1, startIndex);
result.set(2, endIndex);
}
请注意,无论两个单词的顺序如何,这都可以处理两种情况!
答案 3 :(得分:0)
您可以使用Knuth Morris Pratt
算法查找文本中每个给定单词的所有匹配项的索引。想象一下,你有长度为N和M的文字(w1 ... wM)。使用KMP
算法可以得到数组:
occur = string[N];
occur[i] = 1, if w1 starts at position i
...
occur[i] = M, if wM starts at position i
occur[i] = 0, if no word from w1...wM starts at position i
循环遍历此数组,并从每个非零位置搜索其他M-1字。
这是近似伪代码。只是为了理解这个想法。如果你只是在java上重新编码它肯定是行不通的:
for i=0 to N-1 {
if occur[i] != 0 {
for j = i + w[occur[i] - 1].length - 1 { // searching forward
if occur[j] != 0 and !foundWords.contains(occur[j]) {
foundWords.add(occur[j]);
lastWordInd = j;
if foundWords.containAllWords() break;
}
foundTextPeaceLen = j + w[occur[lastWordInd]].length - i;
if foundTextPeaceLen < minTextPeaceLen {
minTextPeaceLen = foundTextPeaceLen;
// also remember start and end indexes of text peace
}
}
}
}