Java删除特殊阿拉伯字符

时间:2014-06-12 07:56:05

标签: java internationalization

我需要编写一个实用程序,它将从中删除一些特殊字符 给定的String输入。我无法理解,我该如何处理这项任务。我得到了 一个db程序,它执行相同的操作,我需要在java代码中复制相同的算法。 我把程序放在这里。

create or replace procedure dbimm.check_arabic_letters (name_a in out varchar2) as
      pos      number(3);
      strlen   number(3);
      nxtchar  char(1);
      ascval   number(3);
begin
      replace_mult_spaces(name_a);
      strlen := length(name_a);
      pos := 1;
      while pos <= strlen loop
         nxtchar := substr(name_a, pos, 1);
         ascval  := ascii(nxtchar);
      --   dbms_output.put_line(to_char(ascval));
         if (ascval between 193 and 218) or
            (ascval between 225 and 234) or
            (ascval in  (32,38,40,41,47,247, 248, 249, 250))
         then
             pos := pos + 1;
         else
            raise_application_error(-20000,display_message(9));
         end if;
      end loop;
      name_a := replace(name_a, 'ي ','ى ');
      if substr(name_a, strlen) = 'ي' then
          name_a := substr(name_a, 1, strlen - 1) || 'ى';
      end if;
      name_a := replace(name_a, 'ة ', 'ه ');
      if substr(name_a, strlen) = 'ة' then
          name_a := substr(name_a, 1, strlen - 1) || 'ه';
      end if;

      /*   Old code commented by Mobeen
      name_a := replace(name_a, ' عبد ',' عبد');
      if instr(name_a,'عبد ') = 1 and length(name_a) > 4 then
          name_a := substr(name_a, 1, 3) || substr(name_a,5);
      end if;
      */
      -------

     name_a := replace(name_a,'أ','ا');
      name_a := replace(name_a,'إ','ا');
      name_a := replace(name_a,'آ','ا');
      --m name_a := replace(name_a,'لا','?');
      name_a := replace(name_a,chr(250),'لا');
      name_a := replace(name_a,chr(247),'لا');
      name_a := replace(name_a,chr(248),'لا');
      name_a := replace(name_a,chr(249),'لا');
      name_a := replace(name_a,chr(63),'لا');

      --- New Code added by Patrick
      name_a := replace(name_a,   ' عبد ال', ' عبدال');
        if substr(name_a,1,6)= 'عبد ال' then  --start
         name_a:= 'عبدال'||substr(name_a,7);
      end if;
      ----

      name_a := replace(name_a, ' ابن ',' بن '); --middle
      if substr(name_a,1,4)='ابن ' then  --start
         name_a:='بن '||substr(name_a,5);
      end if;
      if substr(name_a,-4)=' ابن' then --end
         name_a:=substr(name_a,1,length(name_a)-4)||' بن';
      end if;
      -------

我开始在java类中复制相同的内容。

public class ReplaceSpecialArabicCharacUtil {


  /**
   * This method is responsible for replacing special arabic
   * Characters from the input given to the method. This method
   * Algorithm is taken from the database procedure already been
   * used for blacklist.
   * @param nameInArabic name in Arabic of applicant. E.g First name, last name
   * @return
   */
  public static String removeSpecialArabicCharacters(String nameInArabic){

    //Step-1 Remove multiple spaces. Take the procedure replica from Naveed
     nameInArabic = nameInArabic.replaceAll(" ې" ,"ی ");


    return nameInArabic;
  }

  /**
   * Driver method responsible for testing the Algorithm.
   * It is replicated from the Database Procedure.
   * @param args
   */
  public static void main(String[] args) throws UnsupportedEncodingException {

    String s ="ې ";
   // System.out.println(removeSpecialArabicCharacters(s).getBytes("UTF-8"));

  }

}

replaceAll不理解空格。我不确定,我是否正在接近问题的正确方法。有人可以帮助我,因为我想以正确的方式编写这个实用程序。

谢谢, 本

1 个答案:

答案 0 :(得分:1)

尽我所能,我使用Java代码模仿你的程序,除了 replace_mult_space ,我不知道它的作用。

注意:当您复制粘贴时,您肯定会发现编译错误,因为我的IDE以及StackOverflow并不能很好地支持阿拉伯字符。因此,您必须自己调整代码,直到达到理想的结果。

这里是与您的程序相当的Java:

public class ReplaceSpecialArabicCharacUtil {

    public static List<Integer> getValidAsciiValues() {
        List<Integer> validAsciiValues = new ArrayList<Integer>();
        for (int i=193; i<=218; i++) {
            validAsciiValues.add(i);
        }
        for (int i=225; i<=234; i++) {
            validAsciiValues.add(i);
        }

        validAsciiValues.add(32);
        validAsciiValues.add(38);
        validAsciiValues.add(40);
        validAsciiValues.add(41);
        validAsciiValues.add(47);
        validAsciiValues.add(247);
        validAsciiValues.add(248);
        validAsciiValues.add(249);
        validAsciiValues.add(250);

        return validAsciiValues;
    }

    public static void removeSpecialArabicCharacters(String name_a) {

        //replace_mult_spaces(name_a)
        int stringLenth = name_a.length();
        int pos = 0; //the Java index is 0-based (starts from 0)
        while (pos < stringLenth) {
            char nextChar = name_a.substring(pos, pos+1).toCharArray()[0];
            int asciiValue = (int) nextChar;
            if (getValidAsciiValues().contains(asciiValue)) {
                pos++;
            } else {
                throw new AssertionError("The string contains invalid characters");
            }
        }
        name_a = name_a.replaceAll("ې"," ې ");
        if (name_a.substring(stringLenth).equals('ي')) {
            name_a = name_a.substring(0, stringLenth - 2);
        }
        name_a = name_a.replaceAll(" ", "ه  ");
        if (name_a.substring(stringLenth).equals("ة")) {
            name_a = name_a.substring(0, stringLenth - 2);
        }

        name_a = name_a.replace('ا', 'أ');
        name_a = name_a.replace('ا', 'إ');
        name_a = name_a.replace('ا', 'آ');
        name_a = name_a.replace((char) 250, 'ل');
        name_a = name_a.replace((char) 247, 'ل');
        name_a = name_a.replace((char) 248, 'ل');
        name_a = name_a.replace((char) 249, 'ل');
        name_a = name_a.replace((char) 63, 'ل');

        name_a.replace(' ابن ',' بن ');
        if (name_a.substring(0,5).equals("'عبد ال")) {
            name_a = name_a.substring(6);
        }


        name_a.replaceAll(" عبد ال"" " عبدال");
        if (name_a.substring(0,3).equals("'ابن"))) {
            name_a = name_a.substring(4);
        }
        if (name_a.substring(-4).equals("ابن))")) {
            name_a = name_a.substring(0, name_a.length()-4);
        }
    }
}

你可以将两者并排比较,以获得更好的感受。