Question

基本上我想做一个临时超级单词计数，但我不确定如何从目录路径创建一个dict对象（作为参数传入）而不是列表来执行我需要做的事情。

虽然我想创建一个字典对象，但我还想使用电子邮件模块将作为文件名的键的ASCII值格式化为电子邮件或消息对象。然后我想使用有效负载提取主体并以这种方式解析它。我在下面有一些例子：

mylist=os.listdir(sys.stdin)
for emails in mylist:
    email_str = emails.open()
    #uncertain if this will get all emails and their content or not
    #all emails are supposed to have a unique identifier, they are essentially still just ascii
    file_dict = {emails : email_str}
#file_dict = dict(zip(mylist, mylist))
for emails in file_dict[emails]:
    msg = email.message_from_string(email_str)
    body = msg.get_payload(decode=True)
    #I'm not entirely sure how message objects and sub objects work, but I want the header to 
    #signature and I'm not sure about the type of emails as far as header style
    #pretend I have a parsing method here that implements the word count and prints it as a dict:
    body.parse(regex)

我不需要解析其值以外的其他键，所以我可以考虑使用message_from_file。

Answer 1

您可以使用任何字符串作为文件路径，甚至可以使用相对文件路径。如果您只是尝试为自己设置数据格式，则可以迭代浏览电子邮件列表并存储输出。

for emailpath in list_of_email_paths
    emailpath = 'someemailpath'
    # open path -- only works if path exists.
    f = open(emailpath)
    file_dict[emailpath] = f.read()
    f.close()

使用打开的文件对象作为键不是一个好主意（如果它甚至可能，只需读取它们并将字符串存储为标识符。阅读os.path上的文档以获取更多信息（顺便说一下 - 你必须导入import os，而不是import os.path）

除此之外，任何不可变对象或引用都可以是字典键，因此将路径存储为键没有问题。 Python并不关心路径的来源，如果它的键是路径，那么dict也不关心;）

Answer 2

不幸的是，因为您要求立即显示如此多的信息，所以我的答案必须更加笼统地概述它们。即使你说你的例子都是纯粹的伪代码，但它完全错误，很难知道你理解什么，你不知道什么部分，所以我将涵盖你在评论中所说的所有基础。

如何阅读文件

您滥用os.listdir，因为它需要字符串路径，而不是文件类型对象。但就个人而言，我喜欢使用glob。它可以节省一些步骤，让您获得完整路径，并按模式进行过滤。让我们假设您的所有电子邮件文件都以.mail

结尾

import sys
import glob

first_path = sys.argv[1]
pattern = "%s/*.mail" % first_path
for mail in glob.iglob(pattern):
    # with context will close the file automatically
    with open(main) as f:
        data = f.read()
        # do something with data here

解析电子邮件格式

使用email模块的示例非常广泛，因此除了为您提供评论链接外，我没有必要在此处展示：http://docs.python.org/library/email-examples.html
如果文件实际上是电子邮件，那么您应该能够使用此模块解析它们并阅读每个文件的消息正文

使用字典

在这种情况下使用字典与python dict的任何一般情况没有区别。你可以从创建一个空字典开始：

file_dict = {}

在目录列表的每个循环中，您将始终拥有字符串路径名称，您希望将其作为密钥。无论您是使用第一个示例读取文件原始数据，还是使用电子邮件模块获取邮件正文，无论哪种方式，您最终都会得到一些文本数据。

for mail in glob.iglob(pattern):
    ...
    # do stuff to get the text data from the file
    data = some_approach_to_reading_file()
    ...
    file_dict[mail] = data

现在你有一个file_dict，其中键是原始文件的路径，而值是读取数据。

<强>摘要

通过这三个部分，您应该掌握大量的一般信息。

传递目录路径以创建字典而不是列表。

2 个答案: