Question

对于给定的字符串，如：

"Today is a bright sunny day in New York"

我想让我的名单成为：

['Today','is','a','bright','sunny','day','in','New York']

另一个例子：

"This is a hello world program"

列表是： ['This', 'is', 'a', 'hello world', 'program']

对于每个给定的字符串S，我们有需要保持在一起的实体E.第一个例子是实体E是“新”，“约克”，第二个例子的实体是“你好”，“世界”。

我试图通过正则表达式完成它，但我没有成功分割空格并合并两个实体。

示例：

regex = "(navy blue)|[a-zA-Z0-9]*" match = re.findall(regex, "the sky looks navy blue.",re.IGNORECASE) print match

输出： ['', '', '', '', '', '', 'navy blue', '', '']

Answer 1

使用split代替>>> s = "Today is a bright sunny day in New York" >>> re.findall(r'New York|\w+', s) ['Today', 'is', 'a', 'bright', 'sunny', 'day', 'in', 'New York'] >>> s = "This is a hello world program" >>> re.findall(r'hello world|\w+', s) ['This', 'is', 'a', 'hello world', 'program']并在表示要提取的字符串的字符类之前交替提供实体

\w

将[a-zA-Z]更改为适当的字符类，例如：>>> regex = r"navy blue|[a-z\d]+" >>> re.findall(regex, "the sky looks navy blue.", re.IGNORECASE) ['the', 'sky', 'looks', 'navy blue']

对于添加到问题的附加样本

使用+字符串构建正则表达式作为一种良好做法
此处不需要分组
使用*代替re.IGNORECASE，以便至少必须匹配一个字符
自指定a-z以来，A-Z或re.I在字符类中就足够了。也可以使用\d作为捷径
[0-9]是HTMLSanitizerMixin

Answer 2

试试这个：

text = "Today is a bright sunny day in New York"
new_list = list(map(str, text.split(" ")))

这应该为您提供以下输出['Today', 'is', 'a', 'bright', 'sunny', 'day', 'in', 'New', 'York']

下一个字符串相同：

hello = "This is a hello world program."
yet_another_list = list(map(str, hello.split(" ")))

给你['This', 'is', 'a', 'hello', 'world', 'program.']

Answer 3

"this is hello word program".split(' ')

拆分会自动生成一个列表。你可以使用任何字符串，单词或字符进行拆分。

如何将字符串拆分为列表并在python中将两个已知令牌合并为一个？

3 个答案: