以下示例:
string1 = "calvin klein design dress calvin klein"
如何删除后两个重复项"calvin"
和"klein"
?
结果应该是
string2 = "calvin klein design dress"
只应删除第二个副本,并且不应更改单词的顺序!
答案 0 :(得分:29)
string1 = "calvin klein design dress calvin klein"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))
这会根据原始词汇列表中单词的索引对字符串中所有(唯一)单词的集合进行排序。
答案 1 :(得分:16)
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
a="calvin klein design dress calvin klein"
a=' '.join(unique_list(a.split()))
答案 2 :(得分:8)
在Python 2.7+中,您可以使用collections.OrderedDict
:
from collections import OrderedDict
s = "calvin klein design dress calvin klein"
print ' '.join(OrderedDict((w,w) for w in s.split()).keys())
答案 3 :(得分:7)
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
我真的希望他们能够继续,尽快从这些食谱中制作一个模块。我非常希望能够from itertools_recipes import unique_everseen
而不是每次需要时使用剪切和粘贴。
像这样使用:
def unique_words(string, ignore_case=False):
key = None
if ignore_case:
key = str.lower
return " ".join(unique_everseen(string.split(), key=key))
string2 = unique_words(string1)
答案 4 :(得分:5)
string = 'calvin klein design dress calvin klein'
def uniquify(string):
output = []
seen = set()
for word in string.split():
if word not in seen:
output.append(word)
seen.add(word)
return ' '.join(output)
print uniquify(string)
答案 5 :(得分:2)
您可以使用一组来跟踪已处理的单词。
words = set()
result = ''
for word in string1.split():
if word not in words:
result = result + word + ' '
words.add(word)
print result
答案 6 :(得分:0)
有几个答案非常接近,但还没有完全结束我的所作所为:
def uniques( your_string ):
seen = set()
return ' '.join( seen.add(i) or i for i in your_string.split() if i not in seen )
当然,如果你想要它更清洁或更快,我们可以重构一下:
def uniques( your_string ):
words = your_string.split()
seen = set()
seen_add = seen.add
def add(x):
seen_add(x)
return x
return ' '.join( add(i) for i in words if i not in seen )
我认为第二个版本的性能与您可以获得的少量代码相同。 (可以使用更多代码在输入字符串的单次扫描中完成所有工作,但对于大多数工作负载,这应该足够了。)
答案 7 :(得分:0)
11和2完美地运作:
s="the sky is blue very blue"
s=s.lower()
slist = s.split()
print " ".join(sorted(set(slist), key=slist.index))
和2
s="the sky is blue very blue"
s=s.lower()
slist = s.split()
print " ".join(sorted(set(slist), key=slist.index))
答案 8 :(得分:0)
问题:删除字符串中的重复项
<!doctype html>
<html ⚡="" lang="en">
<head>
<meta charset="utf-8">
<title>Commerce</title>
<link rel="canonical" href="https://www.ampstart.com/templates/e-commerce/landing.amp">
<meta name="viewport" content="width=device-width,minimum-scale=1,initial-scale=1">
<script async src="https://cdn.ampproject.org/v0.js"></script>
<script custom-element="amp-bind" src="https://cdn.ampproject.org/v0/amp-bind-0.1.js" async></script>
<style amp-boilerplate="">body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animation:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,end) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}</style><noscript><style amp-boilerplate="">body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style></noscript>
<style amp-custom="">
div, input {font-size:120%;margin-top:.5rem}
.ampstart-input {max-width: 100%;width: 100%;font-size: 1rem;line-height: 1.5rem}
.ampstart-input [disabled], .ampstart-input [disabled]+label {opacity: .5}
.ampstart-input [disabled]:focus {outline: 0}
.ampstart-input>input, .ampstart-input>select, .ampstart-input>textarea {width: 100%;margin-top: 1rem;line-height: 1.5rem;border: 0;border-radius: 0;border-bottom: 1px solid #4a4a4a;background: none;color: #000;outline: 0}
.ampstart-input>label {color: #000;pointer-events: none;text-align: left;font-size: 1.125rem;line-height: 1rem;opacity: 1;-webkit-animation: .2s;animation: .2s;-webkit-animation-timing-function: cubic-bezier(.4, 0, .2, 1);animation-timing-function: cubic-bezier(.4, 0, .2, 1);-webkit-animation-fill-mode: forwards;animation-fill-mode: forwards}
.ampstart-input>input:focus, .ampstart-input>select:focus, .ampstart-input>textarea:focus {outline: 0}
.ampstart-input>input:focus::-webkit-input-placeholder, .ampstart-input>select:focus::-webkit-input-placeholder, .ampstart-input>textarea:focus::-webkit-input-placeholder {color:transparent}
.ampstart-input>input:focus::-moz-placeholder, .ampstart-input>select:focus::-moz-placeholder, .ampstart-input>textarea:focus::-moz-placeholder {color:transparent}
.ampstart-input>input:focus:-ms-input-placeholder, .ampstart-input>select:focus:-ms-input-placeholder, .ampstart-input>textarea:focus:-ms-input-placeholder {color:transparent}
</style>
</head>
<body>
<form method=post target="_top" action-xhr="https://example.com/thankyou.amp.html" custom-validation-reporting="show-all-on-submit" >
<h3>Billing Information</h3>
<div>
<label for="firstname" aria-hidden="true">First name</label>
<input
type="text"
value=""
name="firstname"
id="firstname"
placeholder="Billing First Name"
autocomplete="given-name"
required
on="input-debounced:AMP.setState({dfn: event.value})"
/>
</div>
<div>
<label for="lastname" aria-hidden="true">Last name</label>
<input
type="text"
value=""
name="lastname"
id="lastname"
placeholder="Billing Last name"
autocomplete="family-name"
required on="input-debounced:AMP.setState({dln: event.value})"
/>
</div>
<div class="relative mt1 p0 mb3 bold center">
<input type="checkbox" value="1"
name="billNEdest"
id="billNEdest"
class="borderlt"
on="change:AMP.setState({seb:event.checked})"
/>
<label for="billNEdest">Check to Ship to a Different Address</label>
</div>
<div hidden [hidden]="seb == true ? false : true ">
<h3>Destination Information</h3>
<div>
<label for="destfirstname" aria-hidden="true">First name</label>
<input
type="text"
value="Destiny"
name="destfirstname"
id="destfirstname"
placeholder="Destination First name"
autocomplete="given-name"
required
[value]="thisdfn != null ? thisdfn : dfn != null ? dfn : ''"
on="input-debounced:AMP.setState({thisdfn: event.value})"
/>
</div>
<div>
<label for="destlastname" aria-hidden="true">Last name</label>
<input
type="text"
value=""
name="destlastname"
id="destlastname"
placeholder="Destination Last name"
autocomplete="family-name"
required
[value]="thisdln != null ? thisdln : dln!=null ? dln : ''"
on="input-debounced:AMP.setState({thisdln: event.value})"
/>
</div>
</div>
<input type="submit" value="Submit" class="ampstart-btn">
</form>
</body></html>
答案 9 :(得分:0)
您可以使用以下代码从文本文件或字符串中删除重复或重复的单词-
from collections import Counter
for lines in all_words:
line=''.join(lines.lower())
new_data1=' '.join(lemmatize_sentence(line))
new_data2 = word_tokenize(new_data1)
new_data3=nltk.pos_tag(new_data2)
# below code is for removal of repeated words
for i in range(0, len(new_data3)):
new_data3[i] = "".join(new_data3[i])
UniqW = Counter(new_data3)
new_data5 = " ".join(UniqW.keys())
print (new_data5)
new_data.append(new_data5)
print (new_data)
P.S。 -根据要求进行识别。 希望这会有所帮助!
答案 10 :(得分:0)
您可以简单地通过获取与字符串关联的集合来做到这一点,这是一个数学对象,根据定义,该对象不包含重复的元素。只需将集合中的单词重新组合成字符串即可:
def remove_duplicate_words(string):
return ' '.join(set(string.split()))
答案 11 :(得分:0)
不使用拆分功能(将对面试有所帮助)
def unique_words2(a):
words = []
spaces = ' '
length = len(a)
i = 0
while i < length:
if a[i] not in spaces:
word_start = i
while i < length and a[i] not in spaces:
i += 1
words.append(a[word_start:i])
i += 1
words_stack = []
for val in words: #
if val not in words_stack: # We can replace these three lines with this one -> [words_stack.append(val) for val in words if val not in words_stack]
words_stack.append(val) #
print(' '.join(words_stack)) # or return, your choice
unique_words2('calvin klein design dress calvin klein')
答案 12 :(得分:0)
使用numpy函数 最好为导入添加别名(如np)
Text Wrap
然后您可以像这样 从数组中删除重复项,您可以使用这种方式
import numpy as np
对于您的情况,如果要生成字符串,可以使用
no_duplicates_array = np.unique(your_array)
答案 13 :(得分:-1)
string2 = ' '.join(set(string1.split()))
说明:
.split()
-这是一种将字符串拆分为列表的方法(不使用空格将其拆分为参数)
set()
-它是无序集合的类型,不包括重复项
'separator'.join(list)
-表示您希望将参数列表以字符串之间的分隔符连接到字符串