Question

我目前正在开发与unicode字符相关的应用程序。

因为在传递给Java进行处理之前必须在python中读取unicode字符以确定语言。但是，目前我正在使用python来读取文件，以便在调用相应的Java引擎来处理它之前确定语言。

这个方法花费的时间太长，因为涉及的I / O成本太高，但直接传递unicode字符作为参数不起作用，它会抛出错误：

'charmap' codec cant encode characters in position xx - xx: character maps to <undefined>.

我想做什么（我的代码摘录）：

#reads in the unicode char 
str = "some unicode words"
command = "java -jar unicodeProcessor.jar " + str
subprocess.Popen(command, stdout = PIPE, stderr = PIPE)

Java处理它并将其写入文件。

目前，

#determines what is the language. 
filepath = "filepath of text file"
command = "java -jar unicodeProcessor.jar " + filepath
subprocess.Popen(command, stdout = PIPE, stderr = PIPE)
#in this method I am taking the parameter to be a file instead of a string

这个方法太慢了。

当前代码：

unic = open("unicode_words.txt")
words = unic.read()
if ininstance(words, str):
    convert = unicode(words, 'utf-8')
else: 
    convert = words

command = "java -jar unicodeProcessor.jar " + convert
subprocess.Popen(command, stdout = PIPE, stderr = PIPE)

通过控制台将unicode字符从python传递到Java

0 个答案: