Question

我刚开始在'A byte of Python'的帮助下用Python编程，而这个网站对我帮助很大。我的任务是找到一串日语单词中的所有Hiragana符号。输出应该是一个文本文件，所有平假名序列都堆叠在一个列表中。给我的是以下代码：

# -*- coding: utf-8 -*-
import re
import codecs  

prompt = unicode('alotofjapanesesymbols')
out = codecs.open("output.txt", "w", "utf-8")

#Japanese uses 4 major writing systems that are intermixed in Japanese texts.
#Your task is to find all connected sequences of Hiragana symbols in the
# "prompt" Unicode string above and write them out into separate lines
# using the file handle "out".
# The result should be 40 lines looking like this (excluding the # symbol):
#な
#や
#に
#されている
#はその
Etc.

到目前为止，我来到这里：

# -*- coding: utf-8 -*-

import re
import codecs

s = '受信可能な放送局や、リストに登録されている放送局はその放送局の名前をお使いいただけます 例えばFMヨコハマが受信可能な場合放送局FMヨコハマと言うことでFMヨコハマをお聞きいただけます アドレス帳の検索は音声コマンド登録先を検索のあとに登録されているお名前をお話しください アドレス帳に登録されている方に電話をするときは例えばアドレス帳の鈴木太郎に電話するとお話しください 音声コマンドが認識されにくい場合は同じコマンドを繰り返すかヘルプシステムをご利用ください'

m = re.findall(u'[\u3040-\u309F]', s.decode('utf-8')) # Using regular expressions findall function and unicode of all Hiragana symbols to find them. 

# for char in m:                     
#     print char, hex(ord(char))  # For every character in m, print the character and its unicode using the ord function (unicode is hexadecimal). This is not relevant, but it showed me that i am on the right way.

out = codecs.open( 'output.txt', 'w', 'utf-8' ) 
for char in m:
    out.write(char)
else: 
    out.close()         # Write the Hiragana symbols to a text file as long as there are characters in m, else close file.

最困难的是整个unicode-utf-8编码解码部分，但我发现如何使用re.findall函数和平假名的unicode找到平假名符号之后的感觉。我现在面临的问题是，我有一个包含所有符号的文本文件，但它们以字符串形式显示，而不是列在列表中。我做错了什么，你可以帮帮我吗？我已经完成了几个主题，但这对我没有帮助。感谢您在本网站上给我的帮助。

zawaponga

Answer 1

写作时编码为JSON。

json.dump(m, out)

另外，use unicode literals以避免必须解码。

chars = u'あいうえお'

Python：将列表写入文本文件并显示为列表

1 个答案: