读取CSV并根据关键字列表替换列

时间:2016-09-16 20:47:52

标签: python csv

我是Python新手,非常感谢如何解决这个问题。这就是我要做的事情:

  1. 阅读包含交易列表的CSV文件。每行有6列。
  2. 对于每一行,将class PagedCollection<PageType:Page> { // PageType's associatedtype, which will conform to Pageable. // In the case of the PageType generic parameter being People, // PageType.PageItemType would be of type Person var foo : PageType.PageItemType init(foo: PageType.PageItemType) { self.foo = foo } } // p.foo is of type Person, and is guarenteed to conform to Pageable let p = PagedCollection<People>(foo: Person()) 列与关键字列表进行比较,以查看是否有任何字词与关键字列表中的字词匹配。
    sp_rename

  3. 如果任何字词与关键字列表中的内容匹配,请将@objtype = 'OBJECT'列替换为与该特定关键字列表对应的新条目(例如DESCRIPTION)。

  4. 继续浏览每一行,将其与多个关键字列表进行比较。如果匹配,请将每行中的第5列(|Col0 | Col1 | Col2 | Col3 "DESCRIPTION" | Col4 | Col5 "CATEGORY"|)替换为相应的值。
  5. 保存到新的CSV文件。
  6. 这是我到目前为止所拥有的:

    CATEGORY

    列表在这里使用是正确的吗?如何在列和关键字列表之间进行比较?

1 个答案:

答案 0 :(得分:0)

我发现 pandas 库非常适合这种类型的东西。我确信find_cat def可以加速一点,但是想要了解搜索&amp;替换应用于传达的专栏。

import pandas as pd


def find_cat(desc, cat_dict):
    cat_list = []
    for cat in cat_dict:
        for w in cat_dict[cat]:
            if w in desc:
                cat_list.append(cat)
    return cat_list


cat_d = {
    "cat1": ["1_word_1", "1_word_2"],
    "cat2": ["2_word_1", "2_word_2"],
    "cat3": ["3_word_1", "3_word_2"]
}


df = pd.read_csv('in.csv')
df["category"] = df[["description"]].apply(lambda row: find_cat(row["description"], cat_d), axis=1) 
df.to_csv('out.csv')

其中in.csv包含:

col1,col2,col3,col4,description,category
0,0,0,0,1_word_1,
0,0,0,0,1_word_2,
0,0,0,0,1_word_1,
0,0,0,0,3_word_1,
0,0,0,0,1_word_1,
0,0,0,0,1_word_1,
0,0,0,0,1_word_2,
0,0,0,0,1_word_1,
0,0,0,0,2_word_1,
0,0,0,0,1_word_1,
0,0,0,0,1_word_2,
0,0,0,0,1_word_1,
0,0,0,0,1_word_1,
0,0,0,0,1_word_1,
0,0,0,0,2_word_2,
0,0,0,0,1_word_1,
0,0,0,0,1_word_1,
0,0,0,0,1_word_1,
0,0,0,0,1_word_1,
0,0,0,0,1_word_2,
0,0,0,0,1_word_1,
0,0,0,0,2_word_1,

并生成out.csv:

,col1,col2,col3,col4,description,category
0,0,0,0,0,1_word_1,cat1
1,0,0,0,0,1_word_2,cat1
2,0,0,0,0,1_word_1,cat1
3,0,0,0,0,3_word_1,cat3
4,0,0,0,0,1_word_1,cat1
5,0,0,0,0,1_word_1,cat1
6,0,0,0,0,1_word_2,cat1
7,0,0,0,0,1_word_1,cat1
8,0,0,0,0,2_word_1,cat2
9,0,0,0,0,1_word_1,cat1
10,0,0,0,0,1_word_2,cat1
11,0,0,0,0,1_word_1,cat1
12,0,0,0,0,1_word_1,cat1
13,0,0,0,0,1_word_1,cat1
14,0,0,0,0,2_word_2,cat2
15,0,0,0,0,1_word_1,cat1
16,0,0,0,0,1_word_1,cat1
17,0,0,0,0,1_word_1,cat1
18,0,0,0,0,1_word_1,cat1
19,0,0,0,0,1_word_2,cat1
20,0,0,0,0,1_word_1,cat1
21,0,0,0,0,2_word_1,cat2