Question

所以我有这个代码，以便我可以从用户输入unicode字符串

print "Enter a nepali string" 
split_string=raw_input().decode(sys.stdin.encoding or locale.getpreferredencoding(True))

我有一些unicode字符串，如果该unicode字符串在用户输入字符串中匹配为substring，那么我必须拆分该字符串。假设我在文件中有“सुर”，如果匹配用户输入的“सुरक्षा”，那么我只想在输出中输入“क्षा”

with codecs.open("prefixnepali.txt","rw","utf-8") as prefix:
    for line in prefix:
          line=ud.normalize('NFC',line)
          if line in split_string:
             prefixy=split_string[len(line):len(split_string)]
             print prefixy
          else:
            print line

但是当我运行程序时，我得到了

दि

सुर

रु

当我在终端输入“सुरक्षा”时，哪些是文件中的unicode字符串。我能知道这里有什么问题吗？

Answer 1

问题可能很简单：从文件读取的行在其末尾有换行符。按Reading a file without newlines和Getting rid of \n when using .readlines()

中的建议使用splitlines

with codecs.open("prefixnepali.txt","rw","utf-8") as prefix:
    for line in prefix.read().splitlines():
          line=ud.normalize('NFC',line)
          if line in split_string:
             prefixy=split_string[len(line):len(split_string)]
             print prefixy
          else:
             print line

顺便说一句，line in split_string会在line内的任何地方查找split_string。如果您正在查找前缀匹配，则应使用split_string.find(line) == 0或split_string[0:len(line)] == line。

将用户输入的unicode字符与文件中的unicode字符进行比较

1 个答案: