从完整联系人姓名中删除标题

时间:2013-10-21 14:44:28

标签: c#

我正在编写一个小功能,以从完整的联系人姓名字段中删除常用标题。这就是我到目前为止所做的:

string[] CommonTitles = new string[] { "MR ", "MRS ", "MS ", "MISS ", "DR ", "HERR ", "MONSIEUR ", "HR ", "FRAU ", "A V M ", "ADMIRAAL ", 
                "ADMIRAL ", "ALDERMAN ", "ALHAJI ", "AMBASSADOR ", "BARON ", "BARONES ", "BRIG ", "BRIGADIER ", "BROTHER ", "CANON ", "CAPT ", "CAPTAIN ", 
                "CARDINAL ", "CDR ", "CHIEF ", "CIK ", "CMDR ", "COL ", "COLONEL ", "COMMANDANT ", "COMMANDER ", "COMMISSIONER ", "COMMODORE ", "COMTE ", 
                "COMTESSA ", "CONGRESSMAN ", "CONSEILLER ", "CONSUL ", "CONTE ", "CONTESSA ", "CORPORAL ", "COUNCILLOR ", "COUNT ", "COUNTESS ", "AIR CDRE ", 
                "AIR COMMODORE ", "AIR MARSHAL ", "AIR VICE MARSHAL ", "BRIG GEN ", "BRIG GENERAL ", "BRIGADIER GENERAL ", "CROWN PRINCE ", "CROWN PRINCESS ", 
                "DAME ", "DATIN ", "DATO ", "DATUK ", "DATUK SERI ", "DEACON ", "DEACONESS ", "DEAN ", "DHR ", "DIPL ING ", "DOCTOR ", "DOTT ", "DOTT SA ", 
                "DR ", "DR ING ", "DRA ", "DRS ", "EMBAJADOR ", "EMBAJADORA ", "EN ", "ENCIK ", "ENG ", "EUR ING ", "EXMA SRA ", "EXMO SR ", "F O ", 
                "FATHER ", "FIRST LIEUTIENT ", "FIRST OFFICER ", "FLT LIEUT ", "FLYING OFFICER ", "FR ", "FRAU ", "FRAULEIN ", "FRU ", "GEN ", "GENERAAL ", 
                "GENERAL ", "GOVERNOR ", "GRAAF ", "GRAVIN ", "GROUP CAPTAIN ", "GRP CAPT ", "H E DR ", "H H ", "H M ", "H R H ", "HAJAH ", "HAJI ", 
                "HAJIM ", "HER HIGHNESS ", "HER MAJESTY ", "HERR ", "HIGH CHIEF ", "HIS HIGHNESS ", "HIS HOLINESS ", "HIS MAJESTY ", "HON ", "HR ", 
                "HRA ", "ING ", "IR ", "JONKHEER ", "JUDGE ", "JUSTICE ", "KHUN YING ", "KOLONEL ", "LADY ", "LCDA ", "LIC ", "LIEUT ", "LIEUT CDR ", 
                "LIEUT COL ", "LIEUT GEN ", "LORD ", "MADAME ", "MADEMOISELLE ", "MAJ GEN ", "MAJOR ", "MASTER ", "MEVROUW ", "MISS ", "MLLE ", "MME ", 
                "MONSIEUR ", "MONSIGNOR ", "MSTR ", "NTI ", "PASTOR ", "PRESIDENT ", "PRINCE ", "PRINCESS ", "PRINCESSE ", "PRINSES ", "PROF ", 
                "PROF DR ", "PROF SIR ", "PROFESSOR ", "PUAN ", "PUAN SRI ", "RABBI ", "REAR ADMIRAL ", "REV ", "REV CANON ", "REV DR ", "REV MOTHER ", 
                "REVEREND ", "RVA ", "SENATOR ", "SERGEANT ", "SHEIKH ", "SHEIKHA ", "SIG ", "SIG NA ", "SIG RA ", "SIR ", "SISTER ", "SQN LDR ", "SR ", 
                "SR D ", "SRA ", "SRTA ", "SULTAN ", "TAN SRI ", "TAN SRI DATO ", "TENGKU ", "TEUKU ", "THAN PUYING ", "THE HON DR ", "THE HON JUSTICE ", 
                "THE HON MISS ", "THE HON MR ", "THE HON MRS ", "THE HON MS ", "THE HON SIR ", "THE VERY REV ", "TOH PUAN ", "TUN ", "VICE ADMIRAL ", 
                "VISCOUNT ", "VISCOUNTESS ", "WG CDR " };



            string returnName = textBox1.Text.ToUpper();

            foreach (string title in CommonTitles)
            {
                returnName = returnName.Replace(title, "");
            }

            MessageBox.Show(returnName);

然而,我刚刚尝试通过以下输入对此进行测试: KHUN YING Abu Dina Mr MRS TOH MAJOR 但我得到了回复:KHUN YABU DINA TOH MAJOR

有没有比使用REPLACE功能更好的东西?

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:4)

您可以使用正则表达式。首先,您必须从所有标题中删除尾随空格。然后,您可以使用锚\b来匹配单词边界。为了避免额外的空格,你还需要在标题的前面或后面匹配空格(我之后使用\s*)。您可能仍然有一个尾随空格,因此您还需要Trim()字符串:

var regex = new Regex(@"\b(" + string.Join("|", CommonTitles) + @")\b\s*");
var result = regex.Replace("KHUN YING ABU DINA MR MRS TOH MAJOR", String.Empty).Trim();

这导致:

ABU DINA TOH

您还可以让正则表达式处理大小写问题,以避免将所有内容转换为大写。只需使用RegexOptions.IgnoreCase

即可
var regex = new Regex(
  @"\b(" + string.Join("|", CommonTitles) + @")\b\s*",
  RegexOptions.IgnoreCase
);
var result = regex.Replace("Khun Ying Abu Dina Mr Mrs Toh Major", String.Empty).Trim();

现在的结果是:

Abu Dina Toh