非规范化的字符串比较

时间:2015-03-01 16:33:09

标签: java arrays char hashtable

我的问题是我有一个名为N的数组中的作业列表,例如" Accountant" ,"数量测量师"。我想接受诸如"总会计师等#34;并转变为会计师。

我提出的方法是:

  1. 小写 - 输入和数组
  2. 删除白色间距
  3. 将输入中的每个字符与存储在N中的作业中的每个字符进行比较。
  4. q,其中q = sameChar /当前作业的长度
    1. 将标准化作业名称及其对应的q值存储在哈希表
  5. 我的问题是我在比较两个字符串之间的字符时遇到了问题。任何人都可以指出我做错了什么。提前致谢

    编辑 - 尝试使用tucuxi提出的方法但尝试执行时遇到错误。

    Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - Erroneous sym type: java.util.HashMap.add
        at Normaliser.normalise(Normaliser.java:41)
        at Normaliser.main(Normaliser.java:49)
    Java Result: 1
    
    
    
    import java.lang.*;
    import java.util.HashMap;
    
    public class Normaliser {
    
        public static int distance(String a, String b) {
            a = a.toLowerCase();
            b = b.toLowerCase();
            // i == 0
            int [] costs = new int [b.length() + 1];
            for (int j = 0; j < costs.length; j++)
                costs[j] = j;
            for (int i = 1; i <= a.length(); i++) {
                // j == 0; nw = lev(i - 1, j)
                costs[0] = i;
                int nw = i - 1;
                for (int j = 1; j <= b.length(); j++) {
                    int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
                    nw = costs[j];
                    costs[j] = cj;
                }
            }
            return costs[b.length()];
        }
    
        public static HashMap<String, Integer> normalise(String jobTitle, String[] normalTitles) {
    
            HashMap<String, Integer> normalized = new HashMap<String, Integer>();
            for (String n : normalTitles) {
                normalized.add(n, n.length() - distance(normalTitles, n));
            }
            return normalized;
         }
    
        public static void main(String[] args){
    
            String[] normalTitles = new String[]{"Lawyer", "Engineer", "Accountant"};
            HashMap<String, Integer> qs = normalise("Process Engineer", normalTitles);
            for (String n : normalTitles) {
                System.out.println("job: " + n + " q: " + qs.get(n));
            }    
        }
    
    }
    

2 个答案:

答案 0 :(得分:0)

我不完全确定你在描述中寻找什么。你想要像“前端工程师”这样的字符串输出为“软件工程师”吗?假设,不,以下内容大致有效:

public static String normalise(String jobTitle) {
    if (jobTitle == null) {
        return null;
    }
    String[] normalTitles = {"Architect", "Software engineer", "Quantity surveyor", "Accountant"};
    for (String normal : normalTitles) {
        if (jobTitle.toLowerCase().contains(normal.toLowerCase())) {
            return normal;
        }
    }
    return jobTitle;
}

至少,

System.out.println(normalise("Chief accountant"));

打印

Accountant

答案 1 :(得分:0)

根据评论,我了解您想要输入职位并找到最接近的“规范化”职位。我建议使用与“相同位置的字符”不同的距离指标,例如Levenshtein Distance

String a = "Coloring Specialist";
String b = "Colouring Specialist";
charsInSamePosition(a, b); // = 4, even though they are really close
a.length() - levenshteinDistance(a, b); // = 19, as expected

使用http://rosettacode.org/wiki/Levenshtein_distance#JavalevenshteinDistance的实现,最终代码可以是:

    public static HashMap<String, Integer> normalize(String jobTitle,
            String[] normalTitles) {
        HashMap<String, Integer> normalized = new HashMap<String, Integer>();
        for (String n : normalTitles) {
                normalized.put(n, jobTitle.length() - distance(jobTitle, n));
        }
        return normalized;
    }

示例电话:

String[] normalTitles = new String[]{"Lawyer", "Engineer", "Accountant"};
HashMap<String, Integer> qs = normalize("Process Engineer", normalTitles);
for (String n : normalTitles) {
    System.out.println("job: " + n + " q: " + qs.get(n));
}

示例输出:

job: Lawyer q: 2
job: Engineer q: 8
job: Accountant q: 3