使用Java根据关键字解析文本

时间:2012-10-23 12:11:54

标签: java algorithm parsing

基本上,我得到一个文件,其中包含有关人员的详细信息,每个人都用新行分隔,例如 “

name Marioka address 97 Garderners Road birthday 12-11-1982 \n
name Ada Lovelace gender woman\n
name James address 65 Watcher Avenue

“等等..

并且,我想将它们解析为[Keyword:Value]对数组,例如

{[Name, Marioka], [Address, 97 Gardeners Road], [Birthday, 12-11-1982]},
{[Name, Ada Lovelace], [Gender, Woman]}, and so on....

等等。关键字将是一组定义的单词,在上面的例子中:姓名,地址,生日,性别等......

这样做的最佳方式是什么?

这就是我这样做的方式,它有效,但想知道是否有更好的解决方案。

    private Map<String, String> readRecord(String record) {
        Map<String, String> attributeValuePairs = new HashMap<String, String>();
        Scanner scanner = new Scanner(record);
        String attribute = "", value = ""; 

        /* 
         * 1. Scan each word. 
         * 2. Find an attribute keyword and store it at "attribute".
         * 3. Following words will be stored as "value" until the next keyword is found.
         * 4. Return value-attribute pairs as HashMap
         */

        while(scanner.hasNext()) {
            String word = scanner.next();
            if (this.isAttribute(word)) {
                if (value.trim() != "") {
                    attributeValuePairs.put(attribute.trim(), value.trim());
                    value = "";
                }
                attribute = word;
            } else {
                value += word + " ";
            }
        }
        if (value.trim() != "") attributeValuePairs.put(attribute, value);

        scanner.close();
        return attributeValuePairs;
    }

    private boolean isAttribute(String word) {
        String[] attributes = {"name", "patientId", 
            "birthday", "phone", "email", "medicalHistory", "address"};
        for (String attribute: attributes) {
            if (word.equalsIgnoreCase(attribute)) return true;
        }
        return false;
    }

6 个答案:

答案 0 :(得分:1)

要从字符串中提取值,请使用正则表达式。我希望您知道如何从文件中读取每一行以及如何使用结果构建数组。

这仍然不是一个好的解决方案,因为如果名称或地址中包含任何关键字,它就不起作用......但这就是你要求的......

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {

        Pattern p = Pattern.compile("name (.+) address (.+) birthday (.+)");

        String text = "name Marioka address 97 Garderners Road birthday 12-11-1982";

        Matcher m = p.matcher(text);

        if (m.matches()) {
            System.out.println(m.group(1) + "\n" + m.group(2) + "\n"
                    + m.group(3));
        } else {
            System.out.println("String does not match");
        }
    }
}

答案 1 :(得分:1)

试试这个:

ArrayList<String> keywords = new ArrayList<String>();
    keywords.add("name");
    keywords.add("address");
    keywords.add("birthday");
    keywords.add("gender");
    String s[] = "name James address 65 Watcher Avenue".trim().split(" ");
    Map<String,String> m = new HashMap<String,String>();
    for(int i=0;i<s.length;i++){

        if(keywords.contains(s[i])){
            System.out.println(s[i]);

            String key =s[i];
            StringBuilder b = new StringBuilder();
            i++;
            if(i<s.length){
            while(!(keywords.contains(s[i]))){

                System.out.println("i "+i);
                if(i<s.length-1){
                b.append(s[i] + " ");
                }
                i++;
                if(i>=s.length){
                    b.append(s[i-1]);
                    break;
                }
            }
            }
            m.put(key, b.toString());
            i--;
        }
    }
    System.out.println(m);

只需将您要识别的关键字添加到名为keywords的arraylist中即可。

已编辑:请注意,如果某人的姓名或地址包含其中一个关键字,则不会生成输出“

答案 2 :(得分:0)

最好的方法是将数据放入地图中,这样您就可以设置键值 (“名称”:“Marioka”)

Map<String,String> mp=new HashMap<String, String>();
    // adding or set elements in Map by put method key and value pair
    mp.put("name", "nameData");
    mp.put("address", "addressData")...etc

答案 3 :(得分:0)

这要求你(伪代码):

1.  >Read a line
2.  >Split it by a delimiter(' ' in your case)
2.5 >Map<String,String> mp = new HashMap<String,String>();
3.  >for(int i = 0; i < splitArray.length; i += 2){
      try{
        mp.put(splitArray[i],splitArray[i+1]);
      }catch(Exception e){ System.err.println("Syntax Error"); }
4.  >Bob's your uncle, Fanny's your aunt. 

虽然您必须修改数据文件以说';' =空间。如

name Ada;Lovelace

答案 4 :(得分:0)

逐行读取文件并在每一行上调用getKeywordValuePairs()方法。

public class S{

    public static void main(String[] args) {
        System.out.println(getKeywordValuePairs("name Marioka address 97 Garderners Road birthday 12-11-1982",
                new String[]{
                    "name", "address", "birthday", "gghghhjgghjhj"
                }));
    }

    public static String getKeywordValuePairs(String text, String keywords[]) {

        ArrayList<String> keyWordsPresent = new ArrayList<>();
        ArrayList<Integer> indicesOfKeywordsPresent = new ArrayList<>();

        // finding the indices of all the keywords and adding them to the array
        // lists only if the keyword is present
        for (int i = 0; i < keywords.length; i++) {
            int index = text.indexOf(keywords[i]);
            if (index >= 0) {
                keyWordsPresent.add(keywords[i]);
                indicesOfKeywordsPresent.add(index);
            }
        }

        // Creating arrays from Array Lists
        String[] keywordsArray = new String[keyWordsPresent.size()];
        int[] indicesArray = new int[indicesOfKeywordsPresent.size()];
        for (int i = 0; i < keywordsArray.length; i++) {
            keywordsArray[i] = keyWordsPresent.get(i);
            indicesArray[i] = indicesOfKeywordsPresent.get(i);
        }


        // Sorting the keywords and indices arrays based on the position where the keyword appears
        for (int i = 0; i < indicesArray.length; i++) {
            for (int j = 0; j < indicesArray.length - 1 - i; j++) {
                if (indicesArray[i] > indicesArray[i + 1]) {
                    int temp = indicesArray[i];
                    indicesArray[i] = indicesArray[i + 1];
                    indicesArray[i + 1] = temp;
                    String tempString = keywordsArray[i];
                    keywordsArray[i] = keywordsArray[i + 1];
                    keywordsArray[i + 1] = tempString;
                }
            }
        }

        // Creating the result String
        String result = "{";
        for (int i = 0; i < keywordsArray.length; i++) {
            result = result + "[" + keywordsArray[i] + ",";
            if (i == keywordsArray.length - 1) {
                result = result + text.substring(indicesArray[i] + keywordsArray[i].length()) + "]";
            } else {
                result = result + text.substring(indicesArray[i] + keywordsArray[i].length(), indicesArray[i + 1]) + "],";
            }
        }
        result = result + "}";
        return result;
    }
}

答案 5 :(得分:0)

我有一个完全不同的解决方案,探索Java regular expressions and Enum阅读和阅读的力量将其解析为pojo,这是未来的解决方案。

步骤-1:定义您的枚举(您可以扩展枚举以添加所有必需的键)

public enum PersonEnum {
  name { public void set(Person d,String name) {  d.setName(name) ;} },
  address { public void set(Person d,String address) {  d.setAddress(address); } },
  gender { public void set(Person d,String address) {  d.setOthers(address); } };
  public void set(Person d,String others) { d.setOthers(others);  }
}

步骤2:定义你的pojo类(如果你不需要pojo,你可以改变enum以使用HashMap

public class Person {

    private String name;
    private String address;
    private String others;

    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getAddress() {
        return address;
    }
    public void setAddress(String address) {
        this.address = address;
    }
    public String getOthers() {
        return others;
    }
    public void setOthers(String others) {
        this.others = others;
    }
    @Override
    public String toString() {
        return name+"==>"+address+"==>"+others;
    }

步骤2:这是解析器

public static void main(String[] args) {

    try {
        String inputs ="name Marioka address 97 Garderners Road birthday 12-11-1982\n name Ada Lovelace gender" +
                " woman address London\n name James address 65 Watcher Avenue";
        Scanner scanner = new Scanner(inputs);
        List<Person> personList = new ArrayList<Person>();
        while(scanner.hasNextLine()){
            String line = scanner.nextLine();
            List<String> filtereList=splitLines(line, "name|address|gender");
            Iterator< String> lineIterator  = filtereList.iterator();
            Person p = new Person();
            while(lineIterator.hasNext()){
                PersonEnum pEnum = PersonEnum.valueOf(lineIterator.next());
                pEnum.set(p, lineIterator.next());
            }
            personList.add(p);
            System.out.println(p);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
public static List<String> splitLines(String inputText, String pString) {
    Pattern pattern =Pattern.compile(pString);
    Matcher m = pattern.matcher(inputText);
    List<String> filteredList = new ArrayList<String>();
    int start = 0;
    while (m.find()) {
        add(inputText.substring(start, m.start()),filteredList);
        add(m.group(),filteredList);
        start = m.end();
    }
    add(inputText.substring(start),filteredList);
    return filteredList;
}
public static void add(String text, List<String> list){
    if(text!=null && !text.trim().isEmpty()){
        list.add(text);
    }
}

注意:您需要在PersonEnum中定义可能的枚举常量,否则您需要采取措施来阻止InvalidArgumentException

eg: java.lang.IllegalArgumentException: No enum const class com.sa.PersonEnum.address

否则,这可能是我建议的最好的Java(OOP)解决方案之一 干杯!