Question

所以我试图通过用for循环遍历它来检查并查看项目符号点是否是列表中项目的一部分。我知道，至少在Regex中，一个项目符号定义为\u2022。但不知道如何使用它。我目前拥有但显然不起作用的是这样的。

list = ['changing. • 5.0 oz.', 'hello', 'dfd','df', 'changing. • 5.0 oz.']
for items in list:
     if "\u2022" in items:
        print('yay')

提前致谢！

Answer 1

最好使用re（正则表达式）库。像这样：

# import regex library
import re

# compile the regex pattern, using raw string (that's what the r"" is)
bullet_point = re.compile(r"\u2022")
list = ['changing. • 5.0 oz.', 'hello', 'dfd','df', 'changing. • 5.0 oz.']

# search each item in the list
for item in list:
    # search for bullet_point in item
    result = re.search(bullet_point, item)         
    if result:
        print('yay')

Answer 2

在Python 3 your code will work fine because UTF-8 is the default source code encoding。如果您要使用Unicode很多，请考虑切换到Python 3。

在Python 2, the default is to treat literal strings as sequences of bytes中，您必须通过在u前加上前缀来显式声明哪些字符串是Unicode。

首先，将源代码编码设置为UTF-8。

# -*- coding: utf-8 -*-

然后告诉Python将这些字符串编码为Unicode。否则它们将被视为单独的字节，这将导致奇怪的事情，如Python认为第一个字符串的长度为21而不是19。

print len(u'changing. • 5.0 oz.')    # 19 characters
print len('changing. • 5.0 oz.')     # 21 bytes

这是因为the Unicode code point U+02022 BULLET is UTF-8 encoded as three bytes e2 80 a2。第一个将它视为单个字符，第二个视为三个字节。

最后，将您要搜索的字符编码为Unicode。这可能是u'\u2022'或u'•'。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

list = [u'changing. • 5.0 oz.', u'hello', u'dfd', u'df', u'changing. • 5.0 oz.']
for item in list:
    if u'•' in item:
        print('yay')

实际代码可能不会使用常量字符串，因此您必须确保list中的任何内容都编码为UTF-8。

检查项目符号列表是否在列表中

2 个答案: