Question

问题
我在数据框（示例数据）中有一个用户日志文件：

+------+--------+----------+------------+
| user |  prod  | prod_cat | goal_label |
+------+--------+----------+------------+
| a    | prod_1 | cat_1    |          5 |
| a    | prod_2 | cat_2    |          5 |
| c    | prod_1 | cat_1    |          1 |
+------+--------+----------+------------+

我想（最终）将其放入数组（每个用户一个条目）：

[[[prod_1, cat_1], [prod_2, cat_2]],\
 [[prod_1, cat_1]]]

我为什么迷路

unique_prod = prod_log.groupby(['user'])['prod'].unique()
unique_prod = unique_prod.to_frame().reset_index()
res = unique_prod['prod'].values

这有效，但仅适用于“ prod”列-如果我不仅仅在系列中使用unique函数，那么它会为我提供：

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

所以问题是：
我是否遗漏了一些明显的东西，或者你们的主人有没有一个整洁的解决方案？还是我需要遍历？
最终，我试图将其输入需要（sequence_length，input_dimension）格式的LSTM网络中。

谢谢

Answer 1

解决此问题的一种方法是使用一系列列表，这些列表是通过压缩两个输入序列而得出的：

let cell: CustomTableViewCell = tableView.dequeueReusableCell(withIdentifier: "CustomTableViewCell", for: indexPath) as! CustomTableViewCell
cell.imageView.contentMode = .scaleAspectFit
tableView.beginUpdates()
cell.imageView.kf.setImage(with: URL(string: item.getImageUrl()), placeholder: nil, options: nil, progressBlock: nil) { (image, error, cacheType, url) in
    tableView.endUpdates()
}
cell.isUserInteractionEnalbed = false
cell.separatorInset = UIEdgeInsets(top: 0, left: 0, bottom: 0, right: .greatestFiniteMagnitude)
cell.accessoryType = .none

return cell

您提出的方法存在问题，这意味着它不太可能起作用：

您提到，如果不执行其他操作，则不得包含df['prod_plus_cat'] = list(map(list, zip(df['prod'], df['prod_cat']))) res = df.groupby('user')['prod_plus_cat'].apply(list).tolist() print(res) [[['prod_1', 'cat_1'], ['prod_2', 'cat_2']], [['prod_1', 'cat_1']]]。
prod_cat应该用于返回唯一值，而不是分组到列表。

Answer 2

如果您有许多列要汇总到许多行中，则可以通过采用NumPy数组表示形式然后转换为列表列表来进行汇总。

然后照常使用GroupBy + apply

df = pd.DataFrame({'user': ['a', 'a', 'c'],
                   'prod': ['prod_1', 'prod_2', 'prod_1'],
                   'prod_cat': ['cat_1', 'cat_2', 'cat_1'],
                   'sub_cat': ['sub_1', 'sub_2', 'sub_3'],
                   'goal_label': [5, 5, 1]})

df['comb'] = df[['prod', 'prod_cat', 'sub_cat']].values.tolist()

res = df.groupby('user')['comb'].apply(list).tolist()

print(res)

[[['prod_1', 'cat_1', 'sub_1'],
  ['prod_2', 'cat_2', 'sub_2']],
 [['prod_1', 'cat_1', 'sub_3']]]

将DF转换为嵌套列表

2 个答案: