我有这个.csv:
Stops containers and removes containers, networks, volumes, and images
created by `up`.
By default, the only things removed are:
- Containers for services defined in the Compose file
- Networks defined in the `networks` section of the Compose file
- The default network, if one is used
Networks and volumes defined as `external` are never removed.
Usage: down [options]
Options:
--rmi type Remove images. Type must be one of:
'all': Remove all images used by any service.
'local': Remove only images that don't have a custom tag
set by the `image` field.
-v, --volumes Remove named volumes declared in the `volumes` section
of the Compose file and anonymous volumes
attached to containers.
--remove-orphans Remove containers for services not defined in the
Compose file
我需要为每个col1唯一元素获取col2值,并创建一个新的.csv,如下所示:
col1,col2,col3,col4,col5
247,19,1.0,2016-01-01 14:11:21,MP
247,3,1.0,2016-01-01 14:23:43,MP
247,12,1.0,2016-01-01 15:32:16,MP
402,3,1.0,2016-01-01 12:11:15,?
583,12,1.0,2016-01-01 02:33:57,?
769,16,1.0,2016-01-01 03:12:24,?
769,4,1.0,2016-01-01 03:22:29,?
.....
也就是说,我要输出数字,直到看到一个非唯一值为止,这时我将开始换行并继续输出数字。
我以这种方式读取.csv,并从列表中删除了重复项:
expected output:
19,3,12
3
12
16,4
...
现在事情对我来说越来越困难,我是python的新手,我的想法是将list2中的每个元素与df中的每一行进行比较,并在一个新的.csv中编写col2元素,请您帮我吗?
答案 0 :(得分:3)
python3中的示例
dd <- data.frame(x=rnorm(1000),y=rnorm(1000))
ggplot(dd,aes(x,y))+geom_hex()+scale_x_reverse()
也许,您可以尝试一下。不要将整个输出存储在列表或任何数据结构(内存问题)中。在读取和聚合时写入文件。(还应优化读取以获取迭代器,而不是一次从输入文件中加载整个内容。
答案 1 :(得分:1)
您可以通过将数据分组然后应用set
函数作为聚合来完成此操作。
df.groupby('col1')['col2'].apply(set).apply(list)
apply(set)
函数为每个col2
值创建一组所有不同的col1
元素,然后apply(list)
函数将其转换为列表。
答案 2 :(得分:0)
您需要跟踪重复项。最简单的方法(虽然易于理解,但会降低效率)
# For WebSocket upgrade header
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";