将文件转换成字典

时间:2019-02-26 23:55:56

标签: python-3.x dictionary

my_file = "The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again. "

预期输出:

{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'], 'itsy': ['bitsy', 'bitsy'], 'bitsy': ['spider', 'spider'], 'spider': ['went', 'out', 'went'], 'went': ['up', 'up'], 'up': ['the', 'all', 'the'], 'water': ['spout'], 'spout': ['down', 'again'], 'down': ['came'], 'came': ['the', 'the'], 'rain': ['washed', 'and'], 'washed': ['the'], 'out': ['out', 'came'], 'sun': ['dried'], 'dried': ['up'], 'all': ['the'], 'and': ['the'], 'again': []}

我的代码:

import string

words_set = {}
    for line in my_file:
        lower_text = line.lower()
        for word in lower_text.split():
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in words_set:
                    words_set[word] = words_set[word] + 1
                else:
                    words_set[word] = 1

1 个答案:

答案 0 :(得分:1)

您可以通过以下几个概念来重现预期的结果:

给出

 string firstDate = "19/02/2019 08:24:59";
 // Since we don't have AM / PM we can conclude that hour is in 0..23 range 
 string customFormatForFirstDateTimeString = "dd/MM/yyyy HH:mm:ss";

 string secondDate = "2/17/2019 12:25:46 PM";
 string customFormatForSecondDateTimeString = "M/dd/yyyy hh:mm:ss tt";

代码

#include <iostream>

using namespace std;

#define MAXSIZE 5

class stack {
    int cap;
    int top;
    int *arr;

public:
    stack();
    bool push(int x);
    bool full();
    bool pop();
    bool empty();
    bool meminc();
};

stack::stack()
{
    cap = MAXSIZE;
    arr = (int *)malloc(sizeof(int)*MAXSIZE);
    top = -1;
}

bool stack::meminc()
{
    cap = 2 * cap;
    cout << cap << endl;
    this->arr = (int *)realloc(arr, sizeof(int)*cap);
    return(arr ? true : false);
}

bool stack::push(int x)
{
    if (full())
    {
        bool x = meminc();
        if (x)
            cout << "Memory increased" << endl;
        else
            return false;
    }

    arr[top++] = x;
    return true;
}

bool stack::full()
{
    return(top == MAXSIZE - 1 ? true : false);
}

bool stack::pop()
{
    if (empty())
        return false;
    else
    {
        top--;
        return true;
    }
}

bool stack::empty()
{
    return(top == -1 ? true : false);
}

int main()
{
    stack s;
    char y = 'y';
    int choice, x;
    bool check;

    while (y == 'y' || y == 'Y')
    {
        cout << "                 1.push\n                    2.pop\n" << endl;
        cin >> choice;

        switch (choice)
        {
        case 1: cout << "Enter data?" << endl;
            cin >> x;
            check = s.push(x);
            cout << (check ? "              push complete\n" : "              push failed\n");
            break;

        case 2: check = s.pop();
            cout << (check ? "              pop complete\n" : "               pop failed\n");
            break;

        default: cout << "ERROR";
        }
    }
}

演示

import string
import itertools as it
import collections as ct


data = """\
The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again.
"""

结果

def clean_string(s:str) -> str:
    """Return a list of lowered strings without punctuation."""
    table = str.maketrans("","", string.punctuation)
    return s.lower().translate(table).replace("  ", " ").replace("\n", " ")


def get_neighbors(words:list) -> dict:
    """Return a dict of right-hand, neighboring words."""
    dd = ct.defaultdict(list)
    for word, nxt in it.zip_longest(words, words[1:], fillvalue=""):
        dd[word].append(nxt)
    return dict(dd)

详细信息

words = clean_string(data).split() get_neighbors(words)

  • 您可以使用多种方法来remove punctuation。在这里,我们使用转换表来替换大多数标点符号。其他人则通过{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'], 'itsy': ['bitsy', 'bitsy'], 'bitsy': ['spider', 'spider'], 'spider': ['went', 'out', 'went'], 'went': ['up', 'up'], 'up': ['the', 'all', 'the'], 'water': ['spout'], 'spout': ['down', 'again'], 'down': ['came'], 'came': ['the', 'the'], 'rain': ['washed', 'and'], 'washed': ['the'], 'out': ['out', 'came'], 'sun': ['dried'], 'dried': ['up'], 'all': ['the'], 'and': ['the'], 'again': ['']} 直接删除。

clean_string

  • defaultdict决定了列表的用法。如果缺少密钥,则会创建一个新的列表值。
  • 我们通过迭代两个并列的单词列表来做出命令。
  • 这些列表是zipped by the longest列表,用空字符串填充较短的列表。
  • str.replace()确保返回简单的字典。

如果您只想数字:

演示

get_neighbors

结果

dict(dd)