Question

我有以下 2 个列表：X_train 和 y_train。我想返回四个列表。第一个包含 X_train 中具有 outlook == overcast 的行，第二个包含这些行的标签。第三个和第四个列表应该包含相同的内容，但对于具有不同外观的行。

X_train = [['sunny', 'hot', 'high', 'FALSE'],
 ['sunny', 'hot', 'high', 'TRUE'],
 ['overcast', 'hot', 'high', 'FALSE'],
 ['rainy', 'mild', 'high', 'FALSE'],
 ['rainy', 'cool', 'normal', 'FALSE'],
 ['rainy', 'cool', 'normal', 'TRUE'],
 ['overcast', 'cool', 'normal', 'TRUE'],
 ['sunny', 'mild', 'high', 'FALSE'],
 ['sunny', 'cool', 'normal', 'FALSE'],
 ['rainy', 'mild', 'normal', 'FALSE'],
 ['sunny', 'mild', 'normal', 'TRUE'],
 ['overcast', 'mild', 'high', 'TRUE'],
 ['overcast', 'hot', 'normal', 'FALSE'],
 ['rainy', 'mild', 'high', 'TRUE']]

y_train = ['no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']

我的代码如下：

overcast=[]
for row in X_train:
    if 'overcast' in row:
        overcast+=[row]

overcast 列表的输出为：

[['overcast', 'hot', 'high', 'FALSE'],
 ['overcast', 'cool', 'normal', 'TRUE'],
 ['overcast', 'mild', 'high', 'TRUE'],
 ['overcast', 'hot', 'normal', 'FALSE']]

我的预期输出是：

([['sunny', 'hot', 'high', 'FALSE'],
  ['sunny', 'hot', 'high', 'TRUE'],
  ['rainy', 'mild', 'high', 'FALSE'],
  ['rainy', 'cool', 'normal', 'FALSE'],
  ['rainy', 'cool', 'normal', 'TRUE'],
  ['sunny', 'mild', 'high', 'FALSE'],
  ['sunny', 'cool', 'normal', 'FALSE'],
  ['rainy', 'mild', 'normal', 'FALSE'],
  ['sunny', 'mild', 'normal', 'TRUE'],
  ['rainy', 'mild', 'high', 'TRUE']],
 ['no', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'no'],
 [['overcast', 'hot', 'high', 'FALSE'],
  ['overcast', 'cool', 'normal', 'TRUE'],
  ['overcast', 'mild', 'high', 'TRUE'],
  ['overcast', 'hot', 'normal', 'FALSE']],
 ['yes', 'yes', 'yes', 'yes'])

现在我被困在如何附加标签 'yes' 和 'no' 对应于 'overcast' 集合，这应该是四个 'yes '。有任何想法可以帮助我，谢谢！

Answer 1

每当您想过滤或转换这样的表格数据时，pandas 都非常方便，例如：

import pandas as pd

df = pd.DataFrame(X_train, columns=['outlook', 'temperature', 'pressure', 'Boole'])
df['y'] = y_train

df[df.outlook == 'overcast']

    outlook     temperature     pressure    Boole   y
2   overcast    hot             high        FALSE   yes
6   overcast    cool            normal      TRUE    yes
11  overcast    mild            high        TRUE    yes
12  overcast    hot             normal      FALSE   yes

如果必须返回列表，可以将数据框转换为嵌套列表，如下所示：

df[df.outlook == 'overcast'].drop('y', axis=1).values.tolist()

[['overcast', 'hot', 'high', 'FALSE'],
 ['overcast', 'cool', 'normal', 'TRUE'],
 ['overcast', 'mild', 'high', 'TRUE'],
 ['overcast', 'hot', 'normal', 'FALSE']]

或者对于标签：

df.y[df.outlook == 'overcast'].values.tolist()

['yes', 'yes', 'yes', 'yes']

Answer 2

对于更基本的方法，您可以使用列表推导式，例如像这样：

private _data: Data;  

@Input() set data(value: Data) {
  this.isLoading = true;
  this._data = value;
  this.isLoading = false;
  this.formGroupUser.markAsPristine();
  this.formGroupUser.markAsUntouched();
}

# extract the first column from X_train
outlook = [row[0] for row in X_train]

# create a Boolean list  
outlook_is_overcast = [x == 'overcast' for x in outlook]

# get the y values for which x is overcast
[y_train[i] for i in range(len(y_train)) if outlook_is_overcast[i]]

通过在列表之间映射值来创建列表

2 个答案: