如何在包含单个点或句点的文件中获取单词?

时间:2013-03-04 14:24:25

标签: java regex java.util.scanner

我想查找表格名称&以下名为'query'的文件中的列名。

var query = "  SELECT accounts.name, SUM((COALESCE((jan_val_c),0)+  ";
query += "  COALESCE((feb_val_c),0)+ COALESCE((march_val_c),0)+ COALESCE((apr_val_c),0)+ ";
query += "  COALESCE((may_val_c),0)+ COALESCE((june_val_c),0)+ COALESCE((july_val_c),0)+   ";
query += "  COALESCE((aug_val_c),0)+ COALESCE((sept_val_c),0)+ COALESCE((oct_val_c),0)+   ";
query += "  COALESCE((nov_val_c),0)+ COALESCE((dec_val_c),0))) AS sales_plan,SUM((COALESCE((jan_actual_val_c),0)+   ";
query += "  COALESCE( (feb_actual_val_c),0)+ COALESCE( (march_actual_val_c),0)+ COALESCE( (apr_actual_val_c),0)+   ";
query += "  COALESCE( (may_actual_val_c),0)+ COALESCE( (june_actual_val_c),0)+ COALESCE( (july_actual_val_c),0)+   ";
query += "  COALESCE( (aug_actual_val_c),0)+ COALESCE( (sept_actual_val_c),0)+ COALESCE( (oct_actual_val_c),0)+   ";
query += "  COALESCE( (nov_actual_val_c),0)+ COALESCE( (dec_actual_val_c),0))) AS Actual_plan ,month_name_c,  ";
query += "   cl_sales_planning_month.year_c, cl_products.volume,cl_brands.name AS brand ,cl_therapies.name   ";
query += "   AS therapy,cl_products.name AS product, accounts.created_by,accounts.assigned_user_id ,   ";
query += "   DATE_FORMAT(STR_TO_DATE(CONCAT_WS('-',cl_sales_planning_month.month_name_c,  ";
query += "   cl_sales_planning_month.year_c),'%M-%Y'),'%b-%y' ) AS monthyear FROM cl_sales_planning_month   ";
query += "   LEFT JOIN accounts ON cl_sales_planning_month.account_id_c =accounts.id LEFT JOIN cl_products   ";
query += "   ON cl_sales_planning_month.cl_products_id_c = cl_products.id LEFT JOIN cl_brands ON   ";
query += "   cl_products.cl_brands_id_c=cl_brands.id LEFT JOIN cl_therapies ON   ";
query += "   cl_products.cl_therapies_id_c=cl_therapies.id WHERE   ";
 query += "            cl_sales_planning_month.month_name_c = MONTHNAME(CURRENT_DATE - INTERVAL 2 MONTH) AND  ";
      query += "            cl_sales_planning_month.year_c = YEAR(CURRENT_DATE - INTERVAL 2 MONTH)  AND";

query += "   cl_sales_planning_month.user_id_c IN ("+ params["childs"].value +") ";
query += "   GROUP BY therapy,monthyear   ";
query += "   ORDER BY STR_TO_DATE(cl_sales_planning_month.year_c,'%Y') ASC,   ";
query += "  STR_TO_DATE(cl_sales_planning_month.month_name_c,'%M') ASC, Actual_plan DESC   "; 

为此,我编写了一个Java程序:

package com.waprau;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.regex.Pattern;

public class SeparateTableNamesColumnNames {
    public static void main(String[] args) {
        File file = new File("/home/waprau/Desktop/query");
        //Pattern = new Pattern("([^\\s]+(\\.(?i))$)");

        try {
            Scanner scanner = new Scanner(file);
            scanner.useDelimiter("\\s|=|,|\\)|\\(|this.|\\].");

            while(scanner.hasNext()){
                if(scanner.next().matches("(?<!\\.)\\b[a-zA-Z]\\w*\\.[a-zA-Z]\\w*\\b(?!\\.)"))
                 System.out.println(scanner.next());;
               }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

通过以上程序,我可以分开所有单词。但是我想只获得包含单个点或句点的单词,例如accounts.name,cl_sales_planning_month.year_c,cl_products.volume,cl_brands.name,cl_therapies.name等。但是,我无法找到一个模式或任何可以将这些单词与文件分开的内容。

但它不起作用。

这是我得到的结果:

enter image description here

这就是我想要的:

enter image description here

感谢任何帮助。

3 个答案:

答案 0 :(得分:1)

要匹配包含点的字词,您可以使用:"\\w+\\.\\w+"

\w匹配字母,数字和下划线。

然而,这也会匹配一个以上时期的东西。你可以通过使用环顾来改善它,以确保在你匹配的单词之前或之后没有另一个时期:

"(?<!\\.)\\b\\w+\\.\\w+\\b(?!\\.)"

这匹配包含点的单词,并且不能在紧接之前或之后包含点。 \b是一个单词边界。

然而,这将匹配像123.45这样的十进制数字。表可以包含数字,但不能以数字开头。所以我们也可以确保每个单词以字母开头:

"(?<!\\.)\\b[a-zA-Z]\\w*\\.[a-zA-Z]\\w*\\b(?!\\.)"

答案 1 :(得分:1)

句号.必须转义,因为它意味着&#34;任何字符&#34;。由于这不是正常的字符串转义(如\n),它使用两个反斜杠:\\.

另外\\s

答案 2 :(得分:1)

无论正则表达式如何(dan1111的答案似乎涵盖了这一点)。你的Java代码有一个缺陷,scanner.next()获取下一个字符串,因为你调用它两次,所以你不会打印你匹配的内容。相反,您将在每场比赛后打印该项目。

如果您按如下方式更改循环,则似乎打印出您想要的内容:

String tmp;
while (scanner.hasNext()) {
    // Store next item so we can match AND print it.
    tmp = scanner.next();
    if (tmp.matches("(?<!\\.)\\b[a-zA-Z]\\w*\\.[a-zA-Z]\\w*\\b(?!\\.)"))
        System.out.println(tmp);
}