处理来自csv文件和python

时间:2015-09-08 06:17:50

标签: python csv dictionary

我有csv个文件file1.csv,其样本结构如下(state下面是US states的缩写):

companyid,state,amount
A,AL,609
A,AL,589
A,AL,915
A,AL,344
A,AL,813
A,AL,825
A,AL,825
A,AL,219
A,AL,778
A,AL,145
A,AL,983
A,AL,621
A,AR,339
A,AR,269

请注意,上面的数据适用于多个companyid's(公司A的记录,后跟公司B的记录等),并且每个公司内都有多个{{ 1}}(您可以查看状态states的记录,然后查看上面公司AL的{​​{1}}。在AR中的每个A和每个companyid的数据文件中,有state

我还有一个python字典companyid,其结构如下:

12 records

此处dict1{'Mississippi': ['102738', '104143', '104046', '102727', '103769', '102865', '105348', '104399', '103016', '105377', '105184', '105829'], 'Oklahoma': ['166332', '167224', '168511', '175317', '171668', '176352', '178444', '179126', '179582', '182935', '186687', '184799'], 'Delaware': ['59254', '59357', '59248', '58559', '59715', '60559', '60829', '62160', '61094', '62375', '63646', '63908'], 'Minnesota': ['294611', '292213', '298997', '297042', '302542', '303040', '311457', '312043', '309764', '312677', '320114', '322264'],.....} ,该键的值为12个数字。除此之外,我的函数key需要两个state name参数func1(list1,list2)list

现在我想做的是,对于csv文件中的每个list1,然后对于list2中的每个companyid,形成两个列表 - state将会有来自csv文件的12条记录(请注意,对于companyid中的每个list1以及companyid中的每个state,csv文件中只有12条记录)和companyid将有12条记录来自list2的字典dict1的记录(状态需要映射)。这两个列表每次都需要传递给函数state,以便每次为每个公司和该公司内的每个不同状态调用func1()一次。

有一点需要注意,csv文件中的func1()采用缩写格式,而在字典state中则采用完整格式。为此,我创建了一个单独的字典,其结构如下:

dict1

我很难理解如何做到这一点?任何人都可以帮助我吗?

注意:我想要的结构是:

states = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
          .
          .
          .

2 个答案:

答案 0 :(得分:1)

第一项任务是以公司容易引用的格式收集您的CSV数据,然后声明:

import csv

company_data = {}  # empty dictionary

with open('data.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')
    next(reader) # skips the header
    for row in reader:
        company_states = company_data.setdefault(row[0], {})
        state_data = company_states.setdefault(row[1], [])
        state_data.append(row[2])
        company_data[row[0]][row[1]] = state_data

在上面的循环结束时,我们的字典看起来像:

>>> company_data['A']['AL']
['609','589','915',...,'621']

接下来,我们需要从其他字典中提取数字以传递给我们的函数。

for company, data in company_data.iteritems():
    # Data is now  the inner dictionary
    for state_abbrev, values in data.iteritems():
         func1(values, dict1[states[state_abbrev]]) 

答案 1 :(得分:0)

这可以使用Python csv库实现。我为Alabama添加了一些虚拟数据,以显示一些合适的输出:

import csv, itertools

dict1 = {'Alabama' : ['1','2','3'], 'Mississippi': ['102738', '104143', '104046', '102727', '103769', '102865', '105348', '104399', '103016', '105377', '105184', '105829'], 'Oklahoma': ['166332', '167224', '168511', '175317', '171668', '176352', '178444', '179126', '179582', '182935', '186687', '184799'], 'Delaware': ['59254', '59357', '59248', '58559', '59715', '60559', '60829', '62160', '61094', '62375', '63646', '63908'], 'Minnesota': ['294611', '292213', '298997', '297042', '302542', '303040', '311457', '312043', '309764', '312677', '320114', '322264']}

states = {
    'AK': 'Alaska',
    'AL': 'Alabama',
    'AR': 'Arkansas',
    'AS': 'American Samoa',
    'AZ': 'Arizona',
    'CA': 'California',
    'CO': 'Colorado',
    'CT': 'Connecticut',
    'DC': 'District of Columbia',
    'DE': 'Delaware',
    'FL': 'Florida',
    'GA': 'Georgia',
    'GU': 'Guam',
    'HI': 'Hawaii',
    'IA': 'Iowa',
    'ID': 'Idaho',
    'IL': 'Illinois'}

def func1(list1, list2):
    print list1
    print list2
    print

with open('file1.csv', 'r') as f_file1:
    csv_file1 = csv.reader(f_file1)
    header = next(csv_file1)

    for list1 in iter(lambda: list(itertools.islice(csv_file1, 12)), []):
        list2 = [[company_id, dict1.get(states.get(state, '<unknown>'), ['<unknown>']), amount] for company_id, state, amount in list1]
        func1(list1, list2)

这将显示func1()的以下输出:

[['A', 'AL', '609'], ['A', 'AL', '589'], ['A', 'AL', '915'], ['A', 'AL', '344'], ['A', 'AL', '813'], ['A', 'AL', '825'], ['A', 'AL', '825'], ['A', 'AL', '219'], ['A', 'AL', '778'], ['A', 'AL', '145'], ['A', 'AL', '983'], ['A', 'AL', '621']]
[['A', ['1', '2', '3'], '609'], ['A', ['1', '2', '3'], '589'], ['A', ['1', '2', '3'], '915'], ['A', ['1', '2', '3'], '344'], ['A', ['1', '2', '3'], '813'], ['A', ['1', '2', '3'], '825'], ['A', ['1', '2', '3'], '825'], ['A', ['1', '2', '3'], '219'], ['A', ['1', '2', '3'], '778'], ['A', ['1', '2', '3'], '145'], ['A', ['1', '2', '3'], '983'], ['A', ['1', '2', '3'], '621']]

[['A', 'AR', '339'], ['A', 'AR', '269']]
[['A', ['<unknown>'], '339'], ['A', ['<unknown>'], '269']]

如果<unknown>中的州或条目丢失,该脚本将为您dict1。使用Python 2.7进行测试。