Python中的字符串操作

时间:2015-10-20 02:10:50

标签: python regex performance time-complexity

最近我对HackerRank进行了测试,我的问题是:

给一个字符串返回可以从中形成的最简洁的字符串,例如:

string = 'watson  Represents|watson represents|Watson represents a first step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'
  • 以下字符串包含许多重复项,例如Watson represents我们必须忽略字符之间的额外间距或大写/小写。watson Represents watson represents是一回事。
  • 分号和逗号代表同样的事情。例如,Cognitive Systems; a new era
  • 中存在Watson represents a first step into cognitive systems, a new era of computing.
  • 您的最终字符串不应包含任何重复项,如果您有
  • ,则忽略小写/大写或额外空格

我的回答:

watson = 'watson  Represents|watson represents|Watson represents a first step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'

import re

watson = re.sub(r';', r',', watson)  #replace the semicolon with colon
watson = watson.strip().split('|')
removeWatson = list(watson)

for i in range(len(watson)):

    for j in range(len(watson)):

        if j == i:
            continue

        if " ".join(watson[i].strip().lower().split()) in " ".join(watson[j].strip().lower().split()):
            removeWatson[i] = ''

print "|".join(filter(None, removeWatson))

我的答案肯定是低效的,我想知道你是否可以建议我采用其他方法来解决这个问题。

最简洁的字符串是:Watson represents a first step into cognitive systems, a new era of computing.|what does watson represent

2 个答案:

答案 0 :(得分:1)

string = 'watson  Represents|watson represents|Watson represents a first   step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'
ll=string.split("|")
ll.sort(key=len)
import re
ll2=[re.sub(r"\s+"," ",re.sub(r"[;,]+","",i.lower())) for i in ll]
j=1
k=0
for i in ll2:
    if re.findall(r"\b"+i.lower()+r"\b","|".join(ll2[j:]),flags=re.I):
        string=string.replace(ll[k],"",1)
    k=k+1
    j=j+1
print re.sub(r"^\|+|\|(?=\|)|\|+$","",string

您可以使用re在1循环中完成此操作。

答案 1 :(得分:0)

我的想法是,我被要求完全代表原始字符串,即我可以从我最简洁的版本中复制原文。

换句话说 - 压缩它。

from __future__ import print_function
from zlib import compress, decompress

string = 'watson  Represents|watson represents|Watson represents a first step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'

print("Original length:", len(string))
compressed = compress(string)
print("Compressed length:", len(compressed))
decompressed = decompress(compressed)
print("Decompressed is equal:", decompressed == string)

结果是:

Original length: 198
Compressed length: 116
Decompressed is equal: True