说明：

Question

我有一个名称作为字符串的文件，在不同的地方缺少空格。

例如：

x = "Red Wings Toast Box Pillow"  
y = "BottlePillLastFly"  
z = "DoorCorn JellowHall Minced Meat"  

#Desired Output:

x = ["Red Wings Toast Box Pillow"]  
y = ["Bottle", "Pill", "Last", "Fly"]  
z = ["Door", "Corn", "Jellow", "Hall", "Minced Meat"]

我需要识别任何缺少空格的字符串。 "DoorCorn" = "门玉米"。我的问题是找到一个无法识别名称格式正确的实例的解决方案。

关于如何完成所需输出的任何想法？基本上，如果字符串已经包含空格，那么它应该保留为一个字符串。如果字符串缺少空格，那么它应该是一个字符串列表。

Answer 1

这种方法怎么样：

import re

x = "Red Wings Toast Box Pillow"  
y = "BottlePillLastFly"  
z = "DoorCorn JellowHall Minced Meat"  

def convert_to_words(s):
    if not re.match(r'[A-Z][^A-Z\s]*[A-Z]', s): return [s]
    return list(re.findall(r'[A-Z][^A-Z]+(?=\s|$|[A-Z])', s))

print(convert_to_words(x))
print(convert_to_words(y))
print(convert_to_words(z))

说明：

如果正则表达式匹配字符串，

re.match 返回一个匹配对象，否则返回 None。所以我们首先检查字符串是否已经正确格式化，如果是，我们返回原始字符串。

如果不是，我们返回所有出现的以大写字母开头的字符串（[A-Z]），后跟多个非大写字母（[^A-Z]+），后跟一个空格、大写字母或字符串的结尾 ((?=\s|$|[A-Z]))。

如果您不知道 (?=) 代表什么，它是一个积极的前瞻，是正则表达式 which you can learn more about here 中非常强大的工具。

正则表达式：识别缺少空格的字符串

1 个答案:

说明：