如何通过使用python3组合相似的字符串来排序.csv文件值

时间:2015-10-31 15:58:01

标签: csv python-3.x

我还在学习,请耐心等待。我现在已经试图解决这个问题但是还没找到我想要的东西。

My Product.csv file looks like this.

111 ; Info1 ; Description 1 ; Remarks1
123 ; Info1 ; Description 1 ; Remarks1
156 ; Info2 ; Description 2 ; Remarks2
124 ; Info3 ; Description 3 ; Remarks3

I would like to combine entries that are similar like this.

111, 123 ; Info1 ; Description 1 ; Remarks1
156 ; Info2 ; Description 2 ; Remarks2
124 ; Info3 ; Description 3 ; Remarks3

From here i can manipulate my csv file in Excel using vba to insert into a quotation.

这是我想用Python实现的。我恍恍惚惚地从哪里开始。我想我需要打开文件然后读取csv文件。之后将变量赋值给#(即111),信息,描述,备注。然后通过变量进行排序,并像#' s一样进行组合。然后将其写回文件。如果您需要我进行任何操作,请告诉我。

2 个答案:

答案 0 :(得分:1)

这是itertools.groupby

的任务

编辑:我重新考虑了第一个版本以提高可读性

# file group_by_trailing_py2.py
import os
import csv
from itertools import groupby

DELIM=';'
IN_FILENAME = 'My Product.csv'
OUT_FILENAME = 'My Product.grouped.csv'

############  skip this if you run  it against productive data ###############
DATA = '''111 ; Info1 ; Description 1 ; Remarks1
123 ; Info1 ; Description 1 ; Remarks1
156 ; Info2 ; Description 2 ; Remarks2
124 ; Info3 ; Description 3 ; Remarks3'''

if (os.environ.get('WITH_DATA_GENERATION')):
    open(IN_FILENAME,'w').write(DATA)
##############################################################################

keyfunc = lambda row: row[1:]

with open(IN_FILENAME) as csv_file:
    rows = sorted(csv.reader(csv_file, delimiter=DELIM), key=keyfunc)

it = map(lambda t: [", ".join(v[0].strip() for v in t[1]) + " "] + t[0],
            groupby(rows, key=keyfunc))

with open(OUT_FILENAME, 'w') as csv_file:
    writer = csv.writer(csv_file, delimiter=DELIM)  
    for row in it:
        writer.writerow(row)

如果用

运行
WITH_DATA_GENERATION=1 python3 group_by_trailing_pk2.py

它使用内容生成My Product.grouped.csv

111, 123 ; Info1 ; Description 1 ; Remarks1
156 ; Info2 ; Description 2 ; Remarks2
124 ; Info3 ; Description 3 ; Remarks3

由于您的现有工作负载,因此您不会设置WITH_DATA_GENERATION 并删除'####...'注释行之间的代码。

答案 1 :(得分:0)

我通过decltype_auto重写了解决方案,以便更好地重复使用:

import csv
import io
from itertools import groupby


def drop_first(row):
    """Returns all but last element."""
    return row[1:]

def make_line(group):
    """Create a text line from a group.

    Joins the grouped result with comma and adds the rest of
    the columns.
    """
    return [", ".join(val[0].strip() for val in group[1]) + " "] + group[0]

def open_path_or_fobj(fobj_or_path, mode='r'):
    """Open a file from a path or return the given file object."""
    if isinstance(fobj_or_path, str):
        return open(fobj_or_path, mode)
    return fobj_or_path

def make_combined(in_fobj_or_path, out_path, delim=';'):
    """Combine lines with same content in first column in one line.
    """
    with open_path_or_fobj(in_fobj_or_path) as csv_file:
        rows = sorted(csv.reader(csv_file, delimiter=delim), key=drop_first)

    it = map(make_line, groupby(rows, key=drop_first))

    with open(out_path, 'w') as csv_file:
        writer = csv.writer(csv_file, delimiter=delim)  
        for row in it:
            writer.writerow(row)

if __name__ == '__main__':

    def test_with_file():
        """Example for use with existing input file."""

        make_combined('My Product.csv', 'My Product.grouped.csv')

    def test_with_stringio():
        """Test with StringIO object as csv input."""

        data = '''111 ; Info1 ; Description 1 ; Remarks1
        123 ; Info1 ; Description 1 ; Remarks1
        156 ; Info2 ; Description 2 ; Remarks2
        124 ; Info3 ; Description 3 ; Remarks3'''

        fobj_in = io.StringIO(data)
        make_combined(fobj_in, 'result.txt')

        data2 = '''111 # Info1 # Description 1 # Remarks1
        123 # Info1 # Description 1 # Remarks1
        156 # Info2 # Description 2 # Remarks2'''

        fobj_in = io.StringIO(data2)
        make_combined(fobj_in, 'result_delim.txt', delim='#')

    # Actually run it with the file.
    test_with_file()

我做了一些事情:

  1. 使用name和docstring将lambda函数转换为普通函数。
  2. 使用io.StringIO处理源中定义的示例数据 文件为字符串。
  3. 使用if __name__ == '__main__':允许导入为模块并在 同时将它用作脚本。