Question

我在C＃中使用它。我从这种格式开始使用类似电子邮件的字符串：

employee[any characters]@company[any characters].com

我想从[任何角色]片段中删除非字母数字。

例如，我想要"employee1@2 r&a*d.m32@@company98 ';99..com"

成为"employee12radm32@company9899.com"

这个表达式简单地取消了所有的特价，但我想在公司和单个公司之前留下一个@。之前com。所以我需要表达式来忽略或掩盖员工，@company和.com文件......只是不知道该怎么做。

var regex = new Regex("[^0-9a-zA-Z]"); //whitelist the acceptables, remove all else.

Answer 1

您可以使用以下正则表达式：

(?:\W)(?!company|com)

它将替换任何特殊字符，除非它后跟company（因此@company将保留）或com（因此.com将保留）：

employee1@2 r&a*d.m32@@company98 ';99..com

将成为

employee12radm32@company9899.com

请参阅：http://regex101.com/r/fY8jD7/2

请注意，您需要g修饰符来替换此类不需要的字符的所有出现。这是C＃中的默认值，因此您只需使用简单的Regex.Replace()：

https://dotnetfiddle.net/iTeZ4F

更新

OFC。正则表达式(?:\W)(?!com)就足够了 - 但它仍会保留#com或~companion之类的部分，因为它们也匹配。因此，仍然并不保证输入 - 或者说转换 - 是100％有效。您应该考虑简单地抛出验证错误，而不是尝试清理输入以满足您的需求。

即使您设法处理此案例 - 如果@company或.com出现两次，该怎么办？

Answer 2

您可以简化正则表达式并将其替换为

tmp = Regex.Replace(n, @"\W+", "");

其中\w表示所有字母，数字和下划线，\W是\w的否定版本。通常，最好创建允许字符的白名单，而不是尝试预测所有不允许的符号。

Answer 3

我可能会写一些类似的东西：

（忽略区分大小写，如果您需要区分大小写，请发表评论）。

DotNetFiddle Example

using System;
using System.Linq;

public class Program
{
    public static void Main()
    {
        var email = "employee1@2 r&a*d.m32@@company98 ';99..com";

        var result = GetValidEmail(email);

        Console.WriteLine(result);
    }


    public static string GetValidEmail(string email)
    {
      var result = email.ToLower();

      // Does it contain everything we need?
      if (email.StartsWith("employee")
          && email.EndsWith(".com")
          && email.Contains("@company"))
      {
        // remove beginning and end.
        result = result.Substring(8, result.Length - 13);
        // remove @company
        var split = result.Split(new string[] { "@company" },
          StringSplitOptions.RemoveEmptyEntries);

        // validate we have more than two (you may not need this)
        if (split.Length != 2)
        {
          throw new ArgumentException("Invalid Email.");
        }

        // recreate valid email
        result = "employee"
          + new string (split[0].Where(c => char.IsLetterOrDigit(c)).ToArray())
          + "@company"
          + new string (split[1].Where(c => char.IsLetterOrDigit(c)).ToArray())
          + ".com";

      }
      else
      {
        throw new ArgumentException("Invalid Email.");
      }

      return result;
    }
}

结果

employee12radm32@company989.com

Answer 4

@dognose提供了一个很棒的正则表达式解决方案。我会在这里作为参考保留我的答案，但我会选择他的，因为它更短/更清洁。

var companyName = "company";
var extension = "com";
var email = "employee1@2 r&a*d.m32@@company98 ';99..com";

var tempEmail = Regex.Replace(email, @"\W+", "");

var companyIndex = tempEmail.IndexOf(companyName);
var extIndex = tempEmail.LastIndexOf(extension);

var fullEmployeeName = tempEmail.Substring(0, companyIndex);
var fullCompanyName = tempEmail.Substring(companyIndex, extIndex - companyIndex);

var validEmail = fullEmployeeName + "@" + fullCompanyName + "." + extension;

Answer 5

尽管可能，使用一个正则表达式模式，您尝试做的事情有点复杂。您可以将此方案分解为更小的步骤。一种方法是提取Username和Domain组（基本上是您所描述的[any character]），＆＃34;修复＆＃34;每组，并用原来的替换。像这样：

// Original input to transform.
string input = @"employee1@2 r&a*d.m32@@company98 ';99..com";

// Regular expression to find and extract "Username" and "Domain" groups, if any.
var matchGroups = Regex.Match(input, @"employee(?<UsernameGroup>(.*))@company(?<DomainGroup>(.*)).com");

string validInput = input;

// Get the username group from the list of matches.
var usernameGroup = matchGroups.Groups["UsernameGroup"];

if (!string.IsNullOrEmpty(usernameGroup.Value))
{
    // Replace non-alphanumeric values with empty string.
    string validUsername = Regex.Replace(usernameGroup.Value, "[^a-zA-Z0-9]", string.Empty);

    // Replace the the invalid instance with the valid one.
    validInput = validInput.Replace(usernameGroup.Value, validUsername);
}

// Get the domain group from the list of matches.
var domainGroup = matchGroups.Groups["DomainGroup"];

if (!string.IsNullOrEmpty(domainGroup.Value))
{
    // Replace non-alphanumeric values with empty string.
    string validDomain = Regex.Replace(domainGroup.Value, "[^a-zA-Z0-9]", string.Empty);

    // Replace the the invalid instance with the valid one.
    validInput = validInput.Replace(domainGroup.Value, validDomain);
}

Console.WriteLine(validInput);

将输出employee12radm32@company9899.com。

正则表达式删除特殊字符，同时保留有效的电子邮件格式

5 个答案: