Question

我提前为这样一个模糊的标题道歉，但我很难准确地概念化这个问题。

我有一个脚本可以检查文本中是否有某个name。如果名称位于文本中，则脚本会将1附加到专门用于该name的列表中，如果该列表不在文本中，则会附加0。

看起来像这样：

import re
import csv
from itertools import izip

names = ['peter', 'john', 'thomas', 'george']
texts = ['peter is awesome', 'john is lazy', 'thomas is thomas','george is curious']

peter_matched = []
john_matched = []
thomas_matched = []
george_matched = []

for text in texts:
    for name in names:
        if name == 'peter':
            if re.match(name, text):
                peter_matched.append(1)
            else:
                peter_matched.append(0)
        if name == 'john':
            if re.match(name, text):
                john_matched.append(1)
            else:
                john_matched.append(0)
        if name == 'thomas':
            if re.match(name, text):
                thomas_matched.append(1)
            else:
                thomas_matched.append(0)
        if name == 'george':
            if re.match(name, text):
                george_matched.append(1)
            else:
                george_matched.append(0)

with open('output_names.csv', 'wb') as f:
        w = csv.writer(f)
        w.writerows(izip(texts, peter_matched, john_matched, thomas_matched, george_matched))

现在你可以看到，它是if/else陈述的丑陋混乱。更有问题的是，我必须为每个name创建一个单独的专用列表来保存匹配的信息，然后将其写入.csv。在我的真实剧本中，我需要交叉引用数千个文本和数百个名称，因此为每个项目编写专用的name_matched列表并不是一个有趣的任务。

所以我的问题是：是否有可能告诉Python自动生成这些列表，方法是从names列表中取一个项目的名称，并将其附加到一些预先存在的字符串，如{{1 }}。

换句话说，我希望自动创建列表_matched，peter_matched等。

提前致谢！

Answer 1

使用dict理解的单线程（因为python 2.7）：

{name: [1 if name in text else 0 for text in texts ] for name in names}

按名称构建dict名称

按键构建字典键（经典方式）：

def check_names(names, texts):
    res = {}
    for name in names:
        res[name] = [1 if name in text else 0 for text in texts]
    return res

奖金：pytest测试

如果您想通过pytest进行测试，请将以下代码放入test_names.py：

import pytest


@pytest.fixture
def names():
    return ['peter', 'john', 'thomas', 'george']


@pytest.fixture
def texts():
    return [
        'peter is awesome',
        'john is lazy',
        'thomas is thomas',
        'george is curious']


def check_names(names, texts):
    res = {}
    for name in names:
        res[name] = [1 if name in text else 0 for text in texts]
    return res


def check_names2(names, texts):
    res = {name: [1 if name in text else 0
                  for text in texts
                  ]
           for name in names
           }
    return res


def test_it(names, texts):
    expected_result = {"peter":  [1, 0, 0, 0],
                       "john":   [0, 1, 0, 0],
                       "thomas": [0, 0, 1, 0],
                       "george": [0, 0, 0, 1],
                       }
    result = check_names2(names, texts)
    assert result == expected_result

并运行

$ py.test -sv test_names.py

Answer 2

您应该创建dict个列表，并根据name字符串检索每个列表。

names = ['peter', 'john', 'thomas', 'george']
texts = ['peter is awesome', 'john is lazy', 'thomas is thomas','george is curious']

matched = {n: [] for n in names}

for text in texts:
    for name in names:
        if re.match(name, text):
            matched[name].append(1)
        else:
            matched[name].append(0)

print matched
# {'john': [0, 1, 0, 0], 'thomas': [0, 0, 1, 0], 'peter': [1, 0, 0, 0], 'george': [0, 0, 0, 1]}

Answer 3

你可以使用字典。你可以这样做：

from collections import defaultdict
counts = defaultdict(int)
for text in tests:
    for name in names:
        if name in text:
            counts[name] += 1

或者，如果您正在寻找精确的0和1，您可以使用字符串类型初始化字典：

counts = defaultdict(str)
for text in tests:
    for name in names:
        counts[name] += '1' if name in text else '0'

Answer 4

不是为每个名称创建单独的列表，而是使用dict类型，特别是defaultdict：

from collections import defaultdict
dict_of_list_names = defaultdict(list)

for text in texts:
    for name in names:
        to_append = 1 if name in text else 0
        dict_of_list_names[name].append(to_append)

此外，从示例中您不需要使用正则表达式。使用for in代替，因为它更快。

Answer 5

第一部分很简单，将名单列表转换为空列表字典

// ES6 generator
function* all_partitions(string) {
    for (var cutpoints = 0; cutpoints < (1 << (string.length - 1)); cutpoints++) {
        var result = [];
        var lastcut = 0;
        for (var i = 0; i < string.length - 1; i++) {
            if (((1 << i) & cutpoints) !== 0) {
                result.push(string.slice(lastcut, i + 1));
                lastcut = i + 1;
            }
        }
        result.push(string.slice(lastcut));
        yield result;
    }
}

for (var partition of all_partitions("abcd")) {
    console.log(partition);
}

填写列表，也很容易

names = {name:[] for name in names}

（请注意，对于您给出的正则表达式的示例而言是过度的。）

困难的部分，imho，正在以与你展示的方式最相似的方式将结果写入文件...我插入了一个标题行，因为for t in texts: for n in names: names[n].append(1 if n in t else 0)没有以给定的顺序返回列表，但是您可以放心，names.values()的顺序与.values()的顺序相同，因此，使用.keys()键创建标题行似乎更容易获得有用的CSV。< / p>

names

结果是

with open('output_names.csv', 'w') as f:
    w = csv.writer(f)
    w.writerow(['text']+list(names.keys()))
    w.writerows(zip(texts, *names.values()))

自动列表生成;蟒蛇

5 个答案:

使用dict理解的单线程（因为python 2.7）：

按名称构建dict名称

奖金：pytest测试