Question

这是一个文件

APPLE: toronto, 2018, garden, tasty, 5
apple is a tasty fruit
>>>end 
apple is a sour fruit
>>>end
grapes: america, 24, organic, sweet, 4
grapes is a sweet fruit
>>>end

这是一个文件，也有换行符。我想用文件创建一个字典。就像这样

函数为def f(file_to: (TextIO))-> Dict[str, List[tuple]]

file_to是输入的文件名，它将返回字典，例如

{'apple': [('apple is a tasty fruit', 2018, 'garden', 'tasty', 5), (apple is a sour fruit)], 'grapes':['grapes is a sweet fruit', 24, 'organic', 5)]}

每种水果都是关键，它们的描述是在那里格式化的值。每个水果都以>>> end

结尾

我尝试过

with open (file_to, "r") as myfile:
    data= myfile.readlines()
return data

它使用/ n返回列表中的文件字符串，我想我可以使用strip（）删除该字符串，并获取'：'之前的元素作为键。

我尝试的代码是

from pprint import pprint
import re
def main():
    fin = open('f1.txt', 'r')

    data = {}
    key = ''
    parsed = []
    for line in fin:
        line = line.rstrip()
        if line.startswith('>'):
            data[key] = parsed
            parsed = []
        elif ':' in line:
            parts = re.split('\W+', line)
            key = parts[0].lower()
            parsed += parts[2:]
        else:
            parsed.insert(0, line)

    fin.close()
    pprint(data)


main()

它没有给出正确的预期结果：（

Answer 1

我认为您确实不需要re和pprint。我尝试了简单的列表理解和一些if语句。

def main:
    data = {}
    key = ''
    parsed = []
    for line in fin:
        line = line.rstrip()
        if line.startswith('>'):
            continue # If we get a line which starts with a '>', we can skip that line.
        elif ':' in line:
            parts = line.strip().split(":")
            key = parts[0].lower()

            firstInfo = parts[1].split(",") # What we have to add in the value, after reading the next line
            firstInfo.pop(0) # Removing the first element, The State name (as it is not required).

            secondInfo = fin.readline().strip() # Reading the next line. It will be the first value in the list.

            value = [secondInfo]

            value.extend([x for x in firstInfo]) # Extending the value list to add other elements.

            data[key] = value

    print(data["apple"])
    return data

如果您在此实现过程中遇到任何问题，我们将很乐意为您提供帮助。（尽管这是自我解释：P）

Answer 2

我对您的代码做了一些调整（我在上一篇文章中给了您）。我认为这可以提供您想要的更新数据。

数据：

APPLE: toronto, 2018, garden, tasty, 5
apple is a tasty fruit
>>>end
apple is a sour fruit
apple is ripe
>>>end
apple is red
>>>end
grapes: america, 24, organic, sweet, 4
grapes is a sweet fruit
>>>end

这是更新的代码：

import re

def main():
    fin = open('f1.txt', 'r')

    data = {}

    for line in fin:
        line = line.rstrip()
        if line.startswith('>'):
            if key not in data:
                data[key] = [tuple(parts)]

        elif re.match('^\w+:\s', line):
            key, _, *parts = re.split('[:,]\s+', line)
        else:
            if key in data:
                data[key].append(line)
            else:
                parts.insert(0, line)

    fin.close()

    for key in data:
        if len(data[key]) > 1:
            data[key][1] = tuple(data[key][1:])
            del data[key][2:]

    print(data)


main()

此修订后的数据和代码的输出为：

{'APPLE': [('apple is a tasty fruit', '2018', 'garden', 'tasty', '5'), ('apple is a sour fruit', 'apple is ripe', 'apple is red')], 'grapes': [('grapes is a sweet fruit', '24', 'organic', 'sweet', '4')]}

返回文件作为字典

2 个答案: