Question

让我们考虑以下示例代码，

pre_process.py

import pandas as pd
from sklearn.preprocessing import LabelBinarizer

class PreProcess(object):

    def __init__(self):
        ... .... ....
        ... .... ....

C:  def fit_clms(self, lb_style, dataset, style_clms = ['A', 'B']):
B:        lb_results = lb_style.fit_transform(dataset[style_clms]) # exp. result is, "dataset['X', 'Y']", but it became to "dataset[['X', 'Y']]", pl note the nested list
        # (**Worked - by this line**) lb_results = lb_style.fit_transform(dataset['A', 'B', 'C'])
        print(lb_results)

        if lb_style.classes_.shape[0] > 0:
            ... .... ....
            ... .... ....

    def process_chunks(self, chunks):
        lb_style = LabelBinarizer()
        print('------------------------------------------------\n')
        count = 0
        for dataset in chunks:
            count += 1
            print ('Processing the Chunk %d ...' % count)

            # Group By
            dataset['Grouping_MS'] = dataset[['_time', 'source']].apply(self.group_by_clm, axis=1)
A:            dataset = self.fit_clms(lb_style, dataset, ['X', 'Y'])
                ... .... ....
                ... .... ....            

    def init(self):
        Times.start()
        # Read the Source File
        chunks = self.read_csv_file(SOURCE_FILE, CHUNK_SIZE)    
        self.process_chunks(chunks)
            ... .... ....
            ... .... ....

在这里，如何传递列表['A', 'B'] (A:)，并通过“ dataset[style_clms]” (B:)访问它？（现在变成[['X', 'Y']]，但是我想要['X'，'Y']，即成为嵌套列表）

此外，将列表设置为函数定义中的“默认”参数(C:)是否很好？如果不是这样，那么任何替代。如何做到这一点？由于Pylint，发出警告，例如“ 危险的默认值[]作为参数”

有什么想法吗？谢谢，

Answer 1

[]-默认值的东西吸引了很多人，所以我先介绍一下。当Python运行您的代码时，它会执行以下操作：

def append_two(a=[]):
    a.append(2)
    return a

print(append_two())
print(append_two([1, 2, 3])
print(append_two())
哦，看！函数定义！好的，因此默认值为[];我们来评估一下...还有一些代码，但让我们暂时不要运行它。
def append_two(a=<list object at 0x34FE2910>):
    ...

print(append_two())
print(append_two([1, 2, 3])
print(append_two())
好，现在运行它。将2附加到[]会产生[2]，所以我们print("[2]")。将2附加到[1, 2, 3]会产生[1, 2, 3, 2]，所以我们print("[1, 2, 3, 2]")。将2附加到[2]会产生[2, 2]，所以我们print("[2, 2]")。完成！
[2]
[1, 2, 3, 2]
[2, 2]

为什么会这样？好吧，那是第一阶段。 Python在评估函数时，为append_two创建了默认列表。这意味着，如果您不传递列表，它将始终附加到该列表中。随着更多2的添加，该列表将随着时间的推移而缓慢增长。

原因是一致性。当您运行该函数时，只会运行该函数内部的内容。函数内无处说“创建新列表”，所以没有。如果需要，您必须告诉它，就像这样：

def append_two(a=None):
    if a is None:
        a = []  # Make a new list
    a.append(2)
    return a

这既笨拙又烦人，但这就是您必须为一致性付出的代价。替代方案更糟。

现在进入您的主要问题。我会稍微简化一下。

class DemoClass:
    def __getitem__(self, index):
        return index
dataset = DemoClass()

style_clms = ["X", "Y"]
print(dataset[style_clms])

这将打印['X', 'Y']。让我们看看dataset["X", "Y"]的打印内容：

>>> print(dataset["X", "Y"])
('X', 'Y')

好吧...这称为tuple。将list转换为tuple很简单：

>>> print(dataset[tuple(style_clms)])
('X', 'Y')

万岁！我们已经成功复制了dataset["X", "Y"]的任意内容！ :-)这有望解决您的问题。

Answer 2

使用以下方法将列表弄平：

import itertools
flat_list = list(itertools.chain(*list2d))

或

flat_list = [item for sublist in l for item in sublist]

如何在Python中传递和访问列表作为参数

2 个答案: