我想通过询问字符串是否包含unicode字符(在短序列中像短语)来返回true或false。
words = ['你好朋友','我吃饭'] # equivalent to 'hello friend', 'I had lunch'
uwords = []
for word in words:
uwords.append(unicode(word,'utf8'))
uwords # [u'\u4f60\u597d\u670b\u53cb', u'\u6211\u5403\u996d']
import re
string = '他吃饭,不是我' # 'she had lunch', 'i did not'
usample = unicode(string, 'utf-8')
pattern = re.compile(u'[\b\u4f60\u597d\u670b\u53cb\b | \b\u6211\u5403\u996d\b]')
# pattern = re.compile(u'\u793e\u533a.*\u670d\u52a1') # [u'\u793e\u533a', u'\u670d\u52a1']
match = pattern.search(usample)
if match:
print True
else:
False
我必须从这段代码中获取False,但我得到了True。我认为我编写的re.compile有问题,似乎代码是单独捕获unicode字符而不是按顺序捕获。
我认为这对英语案例来说是一样的:
import re
string = 'rotten tomatoes are good'
pattern = re.compile('tomatoes are good | apples are good')
match = pattern.search(string)
if match:
print True
else:
False
当我想要的时候,这个人返回了假。
答案 0 :(得分:0)
删除空格和括号解决了答案:
words = ['你好朋友','我吃饭'] # equivalent to 'hello friend', 'I had lunch'
uwords = []
for word in words:
uwords.append(unicode(word,'utf8'))
uwords # [u'\u4f60\u597d\u670b\u53cb', u'\u6211\u5403\u996d']
import re
string = '他吃饭,不是我' # 'she had lunch', 'i did not'
usample = unicode(string, 'utf-8')
pattern = re.compile(u'\u4f60\u597d\u670b\u53cb|\u6211\u5403\u996d')
match = pattern.search(usample)
if match:
print True
else:
False
答案 1 :(得分:0)
[abc]
语法表示"匹配a
或b
或c
中的一个。同样\b
对中文也不起作用,所以删除它也没关系。
请注意,如果使用Unicode字符串,则代码更短且更易读。
#coding
语句声明了源文件的编码,因此Unicode字符串被正确转换。确保以声明的编码保存源。
#coding:utf8
import re
uwords = [u'你好朋友',u'我吃饭'] # equivalent to 'hello friend', 'I had lunch'
usample = u'他吃饭,不是我' # 'she had lunch', 'i did not'
usample2 = u'你好朋友, 你吃了吗?'
pattern = re.compile(u'你好朋友|我吃饭')
print bool(pattern.search(usample))
print bool(pattern.search(usample2))
输出:
False
True