Question

对于这个项目，我们得到了一个看起来像这样的文本文件：

r:are
y:why
u:you
ttyl:talk to you later
l8:late
brb:be right back
lol:laughing out loud
bbl:be back later
...etc...

我的想法是创建一个程序，将句子从文本语音转换为正常，我使用.replace方法，但它给了我不理解的结果。

我在使用Python 3.4.0版的Windows 8上

这是我目前的代码：

def main():
    sentence={}
    sentence=input("enter a sentence to translate\n")
    slang_file = open('slang.txt', 'r')
    for line in slang_file:
        slangword,unslang=line.split(":")
        if slangword in sentence:
            sentence = sentence.replace(slangword, unslang)
    print(sentence)
main()

这是我的输出：

>>> 
enter a sentence to translate
here r some problems. wuts wrong
heare
e are
some pare
oblems. wyou
ts ware
ong
>>>

任何帮助或指示都会很好。

Answer 1

这个想法是检测整个单词。
您当前代码的问题是您正在替换单词中的字母;这是你不想做的事情。
由于我不是python的专家，你可以改进代码..

def main():
    sentence={}
    sentence=input("enter a sentence to translate\n")
    slang_file = open('slang.txt', 'r')
    for line in slang_file:
        slangword,unslang=line.strip().split(":")
        if slangword in sentence.split(" "):
            sentence = sentence.replace(slangword+" ", unslang+" ")
            sentence = sentence.replace(" "+slangword, " "+unslang)
    print(sentence)
main()

Answer 2

deslang = {}
with open('slang.txt', 'r') as f:
    for line in f:
        slang, unslang = line.strip().split(':')
        deslang[slang] = unslang

sentence = input('Enter sentence to translate: ')
for word in deslang:
    sentence.replace(word, deslang[word])
print(sentence)

Enter sentence to translate: y r u l8?
why are you late?

Answer 3

基本问题是：

1. you should split the stencence before replace operation, otherwise it may
use part of stencence which not you want.
2. str.replace will replace all word in str that satisfy your condition

例如，在代码中执行'r'替换时，原始单词：

here r some problem.

将替换其中的所有“r”，并更改为：

heare are some pareoblem

解决方案很简单，如下所示：

def main():
    sentence=input("enter a sentence to translate\n")
    slang_dict = {}
    slang_file = open('slang.txt', 'r')

    for line in slang_file:
        slangword,unslang=line.split(":")
        slang_dict[slangword] = unslang

    result = ""
    for item in sentence.split():
        if item in slang_dict.keys():
            result += slang_dict[item]
        else:
            result += item
        result += " "
    print result

还有一些小问题：

1. don't define stencence with {} as that means stencence is dict,
while it's actuall string.
2. use local dict to store mapping in slang.txt, as it may be repeated used 
and it's waste of time to read file each time

Answer 4

如果您正在进行任何类型的自然语言处理，那么在早期学习re模块非常有用：

import re

def main():
    slang_file = [line.strip().split(":") for line in open("slang.txt")]
    slang = {k:v for k, v in slang_file}
    sentence = input("enter a sentence to translate\n")
    print(
        re.sub(r"\w+", lambda m: slang.get(m.group(0), m.group(0)), sentence)
    )

main()

这里详细解释：

def main():
    # open the input file
    slang_file = open("slang.txt")

    # using a normal list instead of list comprehension
    tmp_list = []

    # the built-in iter method will give you each line
    for line in slang_file:

        # strip the line of linefeeds, carriage returns and spaces
        line = line.strip()

        # split the line in two parts and save to our list
        tmp_list.append(line.split(":"))

    # add each item to a dictionary
    slang = {}

    # key is what you want to find
    # value is what you want to replace it with
    for key, value in tmp_list:
        slang[key] = value

    # get the sentence to translate
    sentence = input("enter a sentence to translate\n")

    #in regular expression \w matches any letter or number
    #\w+ matches any consecutive combination of letters or numbers

    # the second argument is normally a replace statement
    # however this is where the lambda function is helpful
    # m takes the match object for \w+
    # the matched text is retrieved by m.group()
    # which we then use as a key for the slang dictionary to get the replacement
    # the second m.group() is there to be returned when the key is not in slang
    print(
        re.sub(r"\w+", lambda m: slang.get(m.group(), m.group()), sentence)
    )

从文件中替换字符串中的多个单词

4 个答案: