用于从String中提取人员信息的Java库或算法

时间:2014-03-18 10:02:18

标签: java algorithm parsing split

我正在寻找一个Java库或算法来从String中提取人员信息。 我如何从String中提取人物属性?

示例:

String s = "Mr. Dr. Tom Jones";

Person person = new Person(s); 
p.getSurename(); // Jones
p.getFirstname(); // Tom 
p.getSalutation(); // Mr. 
p.getTitle(); // Dr.

我正在寻找一个基于Fuzzy,Levenshtein或Phonetik算法的库。我有标题,名称和称呼的列表进行比较。

我相信这不是一个完美的方式。当然有很多关于名称约定的例外(一些中间名,Jr。缩写,......)。也许有人采取了这一步骤?

1 个答案:

答案 0 :(得分:0)

这够了吗?

import java.util.ArrayList;
import java.util.List;

public class NamesConverter {

    private List<String> titlesBefore = new ArrayList<>();
    private List<String> titlesAfter = new ArrayList<>();
    private String firstName = "";
    private String lastName = "";
    private List<String> middleNames = new ArrayList<>();

    public NamesConverter(String name) {
        String[] words = name.split(" ");
        boolean isTitleAfter = false;
        boolean isFirstName = false;

        int length = words.length;
        for (String word : words) {
            if (word.charAt(word.length() - 1) == '.') {
                if (isTitleAfter) {
                    titlesAfter.add(word);
                } else {
                    titlesBefore.add(word);
                }
            } else {
                isTitleAfter = true;
                if (isFirstName == false) {
                    firstName = word;
                    isFirstName = true;
                } else {
                    middleNames.add(word);
                }
            }
        }
        if (middleNames.size() > 0) {
            lastName = middleNames.get(middleNames.size() - 1);
            middleNames.remove(lastName);
        }
    }

    public List<String> getTitlesBefore() {
        return titlesBefore;
    }

    public List<String> getTitlesAfter() {
        return titlesAfter;
    }

    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public List<String> getMiddleNames() {
        return middleNames;
    }

    @Override
    public String toString() {
        String text = "Titles before :" + titlesBefore.toString() + "\n"
                + "First name :" + firstName + "\n"
                + "Middle names :" + middleNames.toString() + "\n"
                + "Last name :" + lastName + "\n"
                + "Titles after :" + titlesAfter.toString() + "\n";

        return text;
    }
}

例如此输入:

    NamesConverter ns = new NamesConverter("Mr. Dr. Tom Jones");
    NamesConverter ns1 = new NamesConverter("Ing. Tom Ridley Bridley Furthly Murthly Jones CsC.");
    System.out.println(ns);
    System.out.println(ns1);

有这个输出:

Titles before :[Mr., Dr.]
First name :Tom
Middle names :[]
Last name :Jones
Titles after :[]

Titles before :[Ing.]
First name :Tom
Middle names :[Ridley, Bridley, Furthly, Murthly]
Last name :Jones
Titles after :[CsC.]