用indexof计数单词

时间:2015-07-13 01:31:59

标签: java indexof

我必须计算正在阅读的博客条目中的第一个单词...但我的代码不会允许这种情况发生。我不能使用.split或string isempty或数组...这给我带了indexof和substrings。我的代码现在只得到前3个字......对我来说有任何帮助......

这是我必须使用的......

String getSummary()方法 1.最多返回条目的前十个单词作为条目的摘要。如果条目有10个字或更少,则该方法返回整个条目。 2.可能的逻辑 - String类的indexOf方法可以找到空格的位置。使用它和循环结构来查找前10个单词。

public class BlogEntry 
{
    private String username;
    private Date dateOfBlog;
    private String blog;

    public BlogEntry() 
    {
        username = "";
        dateOfBlog = new Date();
        blog = "";
    }

    public BlogEntry(String sName, Date dBlogDate, String sBlog)
    {
        username = sName;
        dateOfBlog = dBlogDate;
        blog = sBlog;
    }

    public String getUsername()
    {
        return username;
    }

    public Date getDateOfBlog()
    {
        return dateOfBlog;
    }

    public String getBlog()
    {
        return blog;
    }

    public void setUsername(String sName)
    {
        username = sName;
    }

    public void setDateOfBlog(Date dBlogDate)
    {
        dateOfBlog.setDate(dBlogDate.getMonth(), dBlogDate.getDay(), dBlogDate.getYear());
    }

    public void setBlog(String sBlog)
    {
        blog = sBlog;
    }

    public String getSummary()
    {
        String summary = "";
        int position;
        int wordCount = 0;
        int start = 0;
        int last;

        position = blog.indexOf(" ");
        while (position != -1 && wordCount < 10)
        {
            summary += blog.substring(start, position) + " ";
            start = position + 1;
            position = blog.indexOf(" ", position + 1);
            wordCount++;
        }

        return summary;
    }

    public String toString()
    {
        return "Author: " + this.getUsername() + "\n\n" + "Date posted: " + this.getDateOfBlog() + "\n\n" + "Text body: " + this.getBlog();
    }
}

5 个答案:

答案 0 :(得分:2)

将此添加到您的代码中:

public static void main(String[] args) 
{
    BlogEntry be = new BlogEntry("" , new Date(), "this program is pissing me off!");
    System.out.println( be.getSummary() );        
}

生成此输出:

this program is pissing me

这不是3个单词,它是5.你应该有6.这使你的bug更容易理解。您正在体验典型的off-by-one error。您只是附加并计算空格之前的单词。这留下了最后一个字,因为它不会出现在空格之前,只是在最后一个空格之后。

这里的一些代码接近你的开头,可以看到所有6个单词:

public String getSummary()
{
    if (blog == null) 
    {
        return "<was null>";
    }

    String summary = "";
    int position;
    int wordCount = 0;
    int start = 0;
    int last;

    position = blog.indexOf(" ");
    while (position != -1 && wordCount < 10)
    {
        summary += blog.substring(start, position) + " ";
        start = position + 1;
        position = blog.indexOf(" ", position + 1);
        wordCount++;
    }
    if (wordCount < 10) 
    {
        summary += blog.substring(start, blog.length());
    }

    return summary;
}

用这个测试时:

public static void main(String[] args) 
{
    String[] testStrings = {
          null //0
        , ""
        , " "
        , "  "
        , " hi"
        , "hi "//5
        , " hi "
        , "this program is pissing me off!"
        , "1 2 3 4 5 6 7 8 9"
        , "1 2 3 4 5 6 7 8 9 "
        , "1 2 3 4 5 6 7 8 9 10"//10
        , "1 2 3 4 5 6 7 8 9 10 "
        , "1 2 3 4 5 6 7 8 9 10 11"
        , "1 2 3 4 5 6 7 8 9 10 11 "
        , "1 2 3 4 5 6 7 8 9 10 11 12"
        , "1 2 3 4 5 6 7 8 9 10 11 12 "//15
    };

    ArrayList<BlogEntry> albe = new ArrayList<>();

    for (String test : testStrings) {
        albe.add(new BlogEntry("" , new Date(), test));
    }

    testStrings[0] = "<was null>";

    for (int i = 0; i < albe.size(); i++ ) {
        assert(albe.get(i).getSummary().equals(testStrings[Math.min(i,11)]));
    }

    for (BlogEntry be : albe)
    {
        System.out.println( be.getSummary() );        
    }
}

会产生这个:

<was null>



 hi
hi 
 hi 
this program is pissing me off!
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 
1 2 3 4 5 6 7 8 9 10 
1 2 3 4 5 6 7 8 9 10 
1 2 3 4 5 6 7 8 9 10 
1 2 3 4 5 6 7 8 9 10 

此外,我不知道您从哪里导入Date,但import java.util.Date;import java.sql.Date;都不会使您的代码无错误。我必须注释掉你的setDate代码。

如果您的教练允许,您当然可以尝试其他答案中的想法,但我想您想知道发生了什么。

答案 1 :(得分:0)

我不确定它会有多高效,但是你可以在每次取索引时修剪掉字符串吗?例如:

tempBlog的内容:
这是一个测试 是一个测试 试验
测试

摘要内容:


一个
测试

public String getSummary()
{
    String summary = "";
    int wordCount = 0;
    int last;
    //Create a copy so you don't overwrite original blog
    String tempBlog = blog;

    while (wordCount < 10)
    {
        //May want to check if there is actually a space to read. 
        summary += tempBlog.substring(0, tempBlog.indexOf(" ")) + " ";
        tempBlog = tempBlog.substring(tempBlog.indexOf(" ")+1);
        wordCount++;
    }

    return summary;
}

答案 2 :(得分:0)

String.indexOf还提供了一个重载,允许从特定点(链接到API)进行搜索。使用这种方法非常简单:

public int countWort(String in , String word){
    int count = 0;

    int index = in.indexOf(word);

    while(index != -1){
        ++count;

        index = in.indexOf(word , index + 1);
    }

    return count;
}

答案 3 :(得分:0)

尝试这个逻辑......

public static void main(String[] args) throws Exception {
        public static void main(String[] args) throws Exception {
    String data = "This one sentence has exactly 10 words in it ok";

    int wordIndex = 0;
    int spaceIndex = 0;
    int wordCount = 0;
    while (wordCount < 1 && spaceIndex != -1) {
        spaceIndex = data.indexOf(" ", wordIndex);
        System.out.println(spaceIndex > -1 
                ? data.substring(wordIndex, spaceIndex)
                : data.substring(wordIndex));

        // The next word "should" be right after the space
        wordIndex = spaceIndex + 1;
        wordCount++;
    }
}

结果:

This
one
sentence
has
exactly
10
words
in
it
ok

更新

regex不是一个选项吗?使用regex,您可以尝试以下操作:

public static void main(String[] args) throws Exception {
    String data = "The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog";
    Matcher matcher = Pattern.compile("\\w+").matcher(data);

    int wordCount = 0;
    while (matcher.find() && wordCount < 10) {
        System.out.println(matcher.group());
        wordCount++;
    }
}

结果:

The
quick
brown
fox
jumps
over
the
lazy
dog
The

正则表达式返回带有以下字符的单词[a-zA-Z_0-9]

答案 4 :(得分:0)

我认为我们可以通过检查字符是否为空格字符来找到前10个单词的索引。这是一个例子:

public class FirstTenWords
{
    public static void main( String[] args )
    {
        String sentence = "There are ten words in this sentence, I want them to be extracted";
        String summary = firstOf( sentence, 10 );
        System.out.println( summary );
    }

    public static String firstOf( String line, int limit )
    {
        boolean isWordMode = false;
        int count = 0;
        int i;
        for( i = 0; i < line.length(); i++ )
        {
            char character = line.charAt( i );
            if( Character.isSpaceChar( character ) )
            {
                if( isWordMode )
                {
                    isWordMode = false;
                }
            }
            else
            {
                if( !isWordMode )
                {
                    isWordMode = true;
                    count++;
                }
            }
            if( count >= limit )
            {
                break;
            }
        }
        return line.substring( 0, i );
    }
}

我的笔记本电脑输出:

There are ten words in this sentence, I want