我有一些字符串,每个字符串都是一些字符串的一个或多个副本。例如:
L = "hellohellohello"
M = "good"
N = "wherewhere"
O = "antant"
我想将这些字符串拆分成一个列表,以便每个元素只包含重复的部分。例如:
splitstring(L) ---> ["hello", "hello", "hello"]
splitstring(M) ---> ["good"]
splitstring(N) ---> ["where", "where"]
splitstring(O) ---> ["ant", "ant"]
由于琴弦长度大约为1000个字符,因此如果速度相当快也会很棒。
请注意,在我的情况下,重复都是从字符串的开头开始,并且它们之间没有间隙,因此它比在字符串中找到最大重复的一般问题简单得多。
怎么能这样做?
答案 0 :(得分:4)
使用正则表达式查找重复单词,然后只需创建适当长度的列表:
def splitstring(string):
match= re.match(r'(.*?)(?:\1)*$', string)
word= match.group(1)
return [word] * (len(string)//len(word))
答案 1 :(得分:1)
试试这个。它不是削减你的列表,而是专注于找到最短的模式,然后通过重复这个模式适当的次数来创建一个新的列表。
def splitstring(s):
# searching the number of characters to split on
proposed_pattern = s[0]
for i, c in enumerate(s[1:], 1):
if proposed_pattern == s[i:(i+len(proposed_pattern))]:
# found it
break
else:
proposed_pattern += c
else:
print 'found no pattern'
exit(1)
# generating the list
n = len(proposed_pattern)
return [proposed_pattern]*(len(s)//n)
if __name__ == '__main__':
L = 'hellohellohellohello'
print splitstring(L) # prints ['hello', 'hello', 'hello', 'hello']
答案 2 :(得分:0)
我将使用的方法:
Thread 0 Crashed:
0 libobjc.A.dylib 0x0000000180eedb90 objc_msgSend + 16
1 CoreData 0x0000000183773010 -[NSManagedObjectContext _mergeRefreshObject:mergeChanges:withPersistentSnapshot:] + 132
2 CoreData 0x00000001837745fc -[NSManagedObjectContext _mergeChangesFromDidSaveDictionary:usingObjectIDs:] + 2276
3 CoreData 0x000000018377cd04 __90+[NSManagedObjectContext(_NSCoreDataSPI) _mergeChangesFromRemoteContextSave:intoContexts:]_block_invoke1353 + 68
4 CoreData 0x000000018377508c developerSubmittedBlockToNSManagedObjectContextPerform + 192
5 CoreData 0x0000000183774f54 -[NSManagedObjectContext performBlockAndWait:] + 216
6 CoreData 0x000000018377c698 +[NSManagedObjectContext(_NSCoreDataSPI) _mergeChangesFromRemoteContextSave:intoContexts:] + 3420
7 CoreData 0x0000000183774bb0 -[NSManagedObjectContext mergeChangesFromContextDidSaveNotification:] + 384
8 RTCoreDataStack 0x00000001005e8d34 __43-[RTCoreDataManager handleMOCNotification:]_block_invoke (RTCoreDataManager.m:294)
9 CoreData 0x000000018377508c developerSubmittedBlockToNSManagedObjectContextPerform + 192
10 libdispatch.dylib 0x00000001812c147c _dispatch_client_callout + 12
11 libdispatch.dylib 0x00000001812c6b84 _dispatch_main_queue_callback_4CF + 1840
12 CoreFoundation 0x000000018182cd50 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 8
13 CoreFoundation 0x000000018182abb8 __CFRunLoopRun + 1624
14 CoreFoundation 0x0000000181754c50 CFRunLoopRunSpecific + 380
15 GraphicsServices 0x000000018303c088 GSEventRunModal + 176
16 UIKit 0x0000000186a3e088 UIApplicationMain + 200
17 MyApp 0x0000000100131910 main (main.m:14)
18 ??? 0x00000001812f28b8 0x0 + 0
使用相应的变量提供以下输出:
import re
L = "hellohellohello"
N = "good"
N = "wherewhere"
cnt = 0
result = ''
for i in range(1,len(L)+1):
if cnt <= len(re.findall(L[0:i],L)):
cnt = len(re.findall(L[0:i],L))
result = re.findall(L[0:i],L)[0]
print(result)
答案 3 :(得分:0)
假设重复单词的长度大于1,这将起作用:
a = "hellohellohello"
def splitstring(string):
for number in range(1, len(string)):
if string[:number] == string[number:number+number]:
return string[:number]
#in case there is no repetition
return string
splitstring(a)
答案 4 :(得分:0)
str