我在python中有一个压缩任务来开发代码,如果输入是
'hello its me, hello can you hear me, hello are you listening'
然后输出应为
1,2,3,1,4,5,6,3,1,7,5,8
基本上每个单词都分配了一个数值,如果单词重复,那么这个单词也是如此。 这个编码是在python中,请帮帮我谢谢
答案 0 :(得分:3)
一种简单的方法是使用dict,当你发现一个新单词时使用递增变量添加一个键/值配对,当你看到这个单词之前只打印dict中的值:
/01/
输出:
s = 'hello its me, hello can you hear me, hello are you listening'
def cyc(s):
# set i to 1
i = 1
# split into words on whitespace
it = s.split()
# create first key/value pair
seen = {it[0]: i}
# yield 1 for first word
yield i
# for all var the first word
for word in it[1:]:
# if we have seen this word already, use it's value from our dict
if word in seen:
yield seen[word]
# else first time seeing it so increment count
# and create new k/v pairing
else:
i += 1
yield i
seen[word] = i
print(list(cyc(s)))
您还可以使用[1, 2, 3, 1, 4, 5, 6, 3, 1, 7, 5, 8]
并调用iter
来弹出第一个单词来避免切片,如果您想要next
我们需要删除字符串中的任何标点符号完成 str.rstrip :
foo == foo!
答案 1 :(得分:2)
如何使用item:index mapping:
构建dict
>>> s
'hello its me, hello can you hear me, hello are you listening'
>>>
>>> l = s.split()
>>> d = {}
>>> i = 1
>>> for x in l:
if x not in d:
d[x]=i
i += 1
>>> d
{'its': 2, 'listening': 8, 'hear': 6, 'hello': 1, 'are': 7, 'you': 5, 'me,': 3, 'can': 4}
>>> for x in l:
print(x, d[x])
hello 1
its 2
me, 3
hello 1
can 4
you 5
hear 6
me, 3
hello 1
are 7
you 5
listening 8
>>>
如果您不希望拆分列表中出现任何标点符号,则可以执行以下操作:
>>> import re
>>> l = re.split(r'(?:,|\s)\s*', s)
>>> l
['hello', 'its', 'me', 'hello', 'can', 'you', 'hear', 'me', 'hello', 'are', 'you', 'listening']
答案 2 :(得分:1)
import re
from collections import OrderedDict
text = 'hello its me, hello can you hear me, hello are you listening'
words = re.sub("[^\w]", " ", text).split()
uniq_words = list(OrderedDict.fromkeys(words))
res = [uniq_words.index(w) + 1 for w in words]
print(res) # [1, 2, 3, 1, 4, 5, 6, 3, 1, 7, 5, 8]