将正则表达式转换为国际字符的帐户

时间:2015-05-07 22:45:57

标签: php regex internationalization

我目前有以下正则表达式用于验证在表单中输入company name

$regexpRange = $min.','.$max;
$regexpPattern = '/^(?=[A-Za-z\d\'\s\,\.]{'.$regexpRange.'}$)(?=.*[a-z\d])[a-zA-Z\d]+[A-Za-z\d\'\s\,\.]+$/m';

我需要将其更新为国际标准以允许国际字符。 我没有这方面的经验

有人可以帮助我了解如何解决这个问题吗?

1 个答案:

答案 0 :(得分:2)

以下是必需的步骤:

  • 使用[17] pry(main)> n => #<Node id: 7, name: "Hercules", family_tree_id: 57, user_id: 57, media_id: 120, media_type: "Video", created_at: "2015-03-12 08:54:29", updated_at: "2015-03-31 21:48:05", circa: nil, is_comment: nil> [18] pry(main)> n.user_tags => [#<ActsAsTaggableOn::Tag id: 4, name: "gerry@test.com", taggings_count: 2>, #<ActsAsTaggableOn::Tag id: 6, name: "danny@test.com", taggings_count: 1>] [19] pry(main)> u => #<User id: 52, email: "gerry@test.com", encrypted_password: "$2a$10$KaX1kvtIw1.jGITnt9Czqeq3xTzhY3OM052NSHsL5Lf...", reset_password_token: nil, reset_password_sent_at: nil, remember_created_at: nil, sign_in_count: 5, current_sign_in_at: "2015-04-03 17:10:28", last_sign_in_at: "2015-04-03 00:38:24", current_sign_in_ip: "127.0.0.1", last_sign_in_ip: "127.0.0.1", created_at: "2015-03-05 01:36:31", updated_at: "2015-04-03 17:10:28", first_name: "Gerry ", confirmation_token: nil, confirmed_at: "2015-03-05 01:36:52", confirmation_sent_at: nil, unconfirmed_email: nil, invitation_relation: "uncle", avatar: nil, invitation_token: nil, invitation_created_at: "2015-03-05 01:36:31", invitation_sent_at: "2015-03-05 01:36:31", invitation_accepted_at: "2015-03-05 01:36:52", invitation_limit: nil, invited_by_id: 1, invited_by_type: "User", invitations_count: 0, bio: nil, last_name: "Atrick", gender: 0> [20] pry(main)> u.email => "gerry@test.com" [21] pry(main)> Node.includes(:user_tags).tagged_with(u.email) ActsAsTaggableOn::Tag Load (2.7ms) SELECT "tags".* FROM "tags" WHERE (LOWER(name) = LOWER('gerry@test.com')) Node Load (2.9ms) SELECT "nodes".* FROM "nodes" JOIN taggings nodes_taggings_baebc90 ON nodes_taggings_baebc90.taggable_id = "nodes".id AND nodes_taggings_baebc90.taggable_type = 'Node' AND nodes_taggings_baebc90.tag_id = 4 ActsAsTaggableOn::Tagging Load (2.5ms) SELECT "taggings".* FROM "taggings" WHERE "taggings"."context" = 'user_tags' AND "taggings"."taggable_type" = 'Node' AND "taggings"."taggable_id" IN (6, 7) ActsAsTaggableOn::Tag Load (1.0ms) SELECT "tags".* FROM "tags" WHERE "tags"."id" IN (4, 6) => [#<Node id: 6, name: "10PP Form Video", family_tree_id: 57, user_id: 57, media_id: 118, media_type: "Video", created_at: "2015-03-09 20:57:19", updated_at: "2015-04-03 00:25:38", circa: nil, is_comment: nil>, #<Node id: 7, name: "Hercules", family_tree_id: 57, user_id: 57, media_id: 120, media_type: "Video", created_at: "2015-03-12 08:54:29", updated_at: "2015-03-31 21:48:05", circa: nil, is_comment: nil>] 模式选项。这会打开u PCRE_UTF8(PHP文档忘记提及那个):

      

    PCRE_UCP

         

    此选项使PCRE将模式和主题视为UTF-8字符串而不是单字节字符串。但是,只有在构建PCRE以包含UTF支持时才可用。如果没有,使用此选项会引发错误。有关此选项如何更改PCRE行为的详细信息,请参阅pcreunicode页面。

         

    PCRE_UTF8

         

    此选项更改了PCRE处理PCRE_UCP\B\b\D\d\S,{{1}的方式},\s和一些POSIX字符类。默认情况下,仅识别ASCII字符,但如果设置了PCRE_UCP,则使用Unicode属性来对字符进行分类。有关详细信息,请参阅pcrepattern页面中有关通用字符类型的部分。如果设置PCRE_UCP,则匹配其影响的项目之一需要更长时间。只有在使用Unicode属性支持编译PCRE时,该选项才可用。

  • \W\w完全相同(已相当于\d),但您必须替换这些PCRE_UCP范围以说明重音字符:

    • \p{N}替换为[a-z]
    • [a-zA-Z]替换为\p{L}
    • [a-z]替换为\p{Ll}

    [A-Z]表示:来自Unicode category X 的字符,其中\p{Lu}表示字母\p{X}表示< em>小写字母和L表示大写字母。您可以从the docs获取列表。

    请注意,您可以在字符类中使用Ll:例如Lu

  • 并确保在PHP中对字符串使用UTF8编码。另外,请确保使用支持Unicode的函数来处理这些字符串。