Question

我有这个.csv：

Stops containers and removes containers, networks, volumes, and images
created by `up`.

By default, the only things removed are:

- Containers for services defined in the Compose file
- Networks defined in the `networks` section of the Compose file
- The default network, if one is used

Networks and volumes defined as `external` are never removed.

Usage: down [options]

Options:
    --rmi type          Remove images. Type must be one of:
                        'all': Remove all images used by any service.
                        'local': Remove only images that don't have a custom tag
                        set by the `image` field.
    -v, --volumes       Remove named volumes declared in the `volumes` section
                        of the Compose file and anonymous volumes
                        attached to containers.
    --remove-orphans    Remove containers for services not defined in the
                        Compose file

我需要为每个col1唯一元素获取col2值，并创建一个新的.csv，如下所示：

col1,col2,col3,col4,col5
247,19,1.0,2016-01-01 14:11:21,MP
247,3,1.0,2016-01-01 14:23:43,MP
247,12,1.0,2016-01-01 15:32:16,MP
402,3,1.0,2016-01-01 12:11:15,?
583,12,1.0,2016-01-01 02:33:57,?
769,16,1.0,2016-01-01 03:12:24,?
769,4,1.0,2016-01-01 03:22:29,?
.....

也就是说，我要输出数字，直到看到一个非唯一值为止，这时我将开始换行并继续输出数字。

我以这种方式读取.csv，并从列表中删除了重复项：

expected output:
19,3,12
3
12
16,4
...

现在事情对我来说越来越困难，我是python的新手，我的想法是将list2中的每个元素与df中的每一行进行比较，并在一个新的.csv中编写col2元素，请您帮我吗？

Answer 1

python3中的示例

 dd <- data.frame(x=rnorm(1000),y=rnorm(1000))
 ggplot(dd,aes(x,y))+geom_hex()+scale_x_reverse()

也许，您可以尝试一下。不要将整个输出存储在列表或任何数据结构（内存问题）中。在读取和聚合时写入文件。（还应优化读取以获取迭代器，而不是一次从输入文件中加载整个内容。

Answer 2

您可以通过将数据分组然后应用set函数作为聚合来完成此操作。

df.groupby('col1')['col2'].apply(set).apply(list)

apply(set)函数为每个col2值创建一组所有不同的col1元素，然后apply(list)函数将其转换为列表。

Answer 3

您需要跟踪重复项。最简单的方法（虽然易于理解，但会降低效率）

  # For WebSocket upgrade header
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

python-将列表与csv比较

3 个答案: