我们假设我有以下文字:
BBC - 这是文字
如何使用正则表达式来测试字符串是否以"* - "
开头?
然后移除"* - "
,只留下"Here is the text"
。 (我正在使用python)。
我使用"*"
因为它显然不会每次都以"BBC - "
开头,它可能是其他一些子字符串。
这会有用吗?
"^.* - "
非常感谢。
答案:
m = re.search(ur'^(.*? [-\xe2\u2014] )?(.*)', text)
这很有用。谢谢@xanatos!
答案 0 :(得分:2)
这里'匹配第一个连字符之前的所有内容和连字符本身'模式:
/^[^-]*-\s*/
内容如下:
^ - starting from the beginning of the string...
[^-]* - match any number (including zero) of non-hyphens, then...
- - match hyphen itself, then...
\s* - match any number (including zero) of whitespace
然后你可以用空字符串替换模式匹配的字符串:替换的结果可能是你需要的整体。 )
答案 1 :(得分:1)
试试这段代码:
str = u"BBC \xe2 abc - Here is the text"
m = re.search(ur'^(.*? [-\xe2] )?(.*)', str, re.UNICODE)
# or equivalent
# m = re.match(ur'(.*? [-\xe2] )?(.*)', str, re.UNICODE)
# You don't really need re.UNICODE, but if you want to use unicode
# characters, it's better you conside à to be a letter :-) , so re.UNICODE
# group(1) contains the part before the hypen
if m.group(1) is not None:
print m.group(1)
# group(2) contains the part after the hypen or all the string
# if there is no hypen
print m.group(2)
正则表达式的解释:
^ is the beginning of the string (the match method always use the beginning
of the string)
(...) creates a capturing group (something that will go in group(...)
(...)? is an optional group
[-\xe2] one character between - and \xe2 (you can put any number of characters
in the [], like [abc] means a or b or c
.*? [-\xe2] (there is a space after the ]) any character followed by a space, an hypen and a space
the *? means that the * is "lazy" so it will try to catch only the
minimum number possible of characters, so ABC - DEF - GHI
.* - would catch ABC - DEF -, while .* - will catch ABC -
so
(.* [-\xe2] )? the string could start with any character followed by an hypen
if yes, put it in group(1), if no group(1) will be None
(.*) and it will be followed by any character. You dont need the
$ (that is the end-of the string, opposite of ^) because * will
always eat all the characters it can eat (it's an eager operator)
答案 2 :(得分:0)
使用?
- 运算符:
'^(.+ [-] )?(.+)$'
也许你想要以更大的灵活性来实现它......
一些琐碎粗暴的测试脚本(使用php代替python,对不起!):
<?php
$string = "BBC - This is the text.";
$pattern = '/^(.+ [-] )?(.+)$/';
preg_match($pattern, $string, $tokens);
var_dump($tokens);
?>
测试片的输出:
array(3) {
[0] =>
string(23) "BBC - This is the text."
[1] =>
string(6) "BBC - "
[2] =>
string(17) "This is the text."
}
第一个括号匹配字符串开头的任何文本,该文本以长度> 0的任何字符开头,后跟空格字符,然后是文字连字符和另一个空格字符。该序列可能存在也可能不存在。第二个括号匹配字符串的所有其余部分直到结尾。
答案 3 :(得分:0)
/^.+-/
应该有用。
以下是根据您的要求的测试用例:
通行证:foo -
通行证:bar-
通行证:-baz-
通行证:*qux-
通行证:-------------
失败:****
失败:-foobar