熊猫无法阅读列名称并显示_0,_1

时间:2019-02-03 23:50:25

标签: python python-3.x pandas

在读取csv文件时,我尝试通过以下方式获取列名列表:

>>> df_attr = pd.read_csv("3BtTWMUQAMawXcCheUAOMlXU.csv", nrows=1)
>>> cols = list(df_attr)
>>> print (cols)

o / p,即 ,列为:

['Response ID', 'Group Context', 'name', 'roh', 'sex', 'age', 'agemonth', 'ms', 'married_in_the_last_1', 'aadhar_yn', 'aadhar_number', 'aadhar_picture', 'post_acc', 'life_ins', 'health_ins', 'asso_memb_Youth.club', 'asso_memb_cultural.group', 'asso_memb_JFM', 'asso_memb_Association.of.the.users.of.drinking.water', 'asso_memb_SHG', 'asso_memb_Semi.governmental.organizations', 'asso_memb_Farmers.association', 'asso_memb_Youth.association', 'asso_memb_Women.s.association', 'asso_memb_nothing', 'asso_memb_Semi.governmental.organizations.1', 'asso_memb_Farmers.association.1', 'asso_memb_Women.s.association.1', 'asso_memb_Youth.association.1', 'asso_memb_SHG_EDTXJEKHESFdTtQwMcIT', 'asso_memb_Association.of.the.users.of.drinking.water.1', 'asso_memb_JFM_EDTXJEKHESFdTtQwMcIT', 'asso_memb_cultural.group.1', 'asso_memb_Youth.club.1', 'asso_memb_nothing.1', 'बँकिंग', 'bank_acc', 'ac_linked_aadhar', 'jdy_yn', 'शिक्षण', 'edu_high', 'edu_informal', 'edu_inst', 'comp_lit', 'कौशल्य.विकासासंबंधी.प्रश्न.फक्त.१८.३५.वयोगटातील.लोक्काना.विचारावे', 'any_skills_development_training', 'receive_skill_training_from', 'member_get_training_from', 'व्यवसाय', 'occ', 'other', 'require_assistance_in_receiving_employment', 'mgnrega_yn', 'mgnregaapp_yn', 'mgnregawork_yn', 'mgnrega_days', 'scheme_memb_Old Pension Scheme', 'scheme_memb_Janani Suraksha Yojana', 'scheme_memb_Disability Benefits', 'scheme_memb_Scholarship', 'scheme_memb_Widow Pension', 'vill_name', 'census_village_sd_2011', 'census_district_2011', 'vill_name_taluka_code', 'census_subdistrict_2011', 'subdistrict_code', 'vill_name_gp_code', 'vill_name_taluka_name', 'district_code', 'vill_name_gp_name', 'vill_name_village_code', 'location_latitude', 'location_longitude', 'location_accuracy', 'hoh', 'contact', 'informant', 'rh', 'rhoth', 'hh_occu', 'religion', 'socialgrp', 'sc_health_id', 'census_country', 'state_name', 'state_code', 'age_married', 'village_code_census2011_raw', 'phase']

和行通过:

>>>df = pd.read_csv("3BtTWMUQAMawXcCheUAOMlXU.csv", chunksize=2)
>>>for d in df:
...     d = d.to_dict(orient='records')
...     for r in d:
...             print(r)
...     import sys
...     sys.exit()

o / p是:

{'_0': '74a7c6f8-94f3-4882-8ad7-9199313a1a51', '_1': 1, 'name': 'Suresh Kautik Patil', 'roh': 'Self', 'sex': 'Male', 'age': 62, 'agemonth': 2, 'ms': 'Married', 'married_in_the_last_1': 'No', 'aadhar_yn': 1, 'aadhar_number': 467172934356, 'aadhar_picture': 'Https://Collect-V2-Production.s3.Ap-South-1.Amazonaws.com/Nzrzdt5akq7uutju90bf%2Fhxbrh6bz57ihvawa3e4n%2Fze2gzezsov7polixljkg%2Fece8f806-3633-445D-8183-336460Cc6207', 'post_acc': 0, 'life_ins': 0, 'health_ins': 0, '_15': 0, '_16': 0, 'asso_memb_JFM': 0, '_18': 0, 'asso_memb_SHG': 0, '_20': 0, '_21': 0, '_22': 0, '_23': 0, 'asso_memb_nothing': 0, '_25': 0, '_26': 0, '_27': 0, '_28': 0, 'asso_memb_SHG_EDTXJEKHESFdTtQwMcIT': 0, '_30': 0, 'asso_memb_JFM_EDTXJEKHESFdTtQwMcIT': 0, '_32': 0, '_33': 0, '_34': 1, 'बँकिंग': nan, 'bank_acc': 1, 'ac_linked_aadhar': 1, 'jdy_yn': 1, 'शिक्षण': nan, 'edu_high': 'Secondary', 'edu_informal': 'Other', 'edu_inst': 'No Information', 'comp_lit': 'No', '_44': nan, 'any_skills_development_training': nan, 'receive_skill_training_from': nan, 'member_get_training_from': nan, 'व्यवसाय': nan, 'occ': 'Labourers', 'other': nan, 'require_assistance_in_receiving_employment': 'No', 'mgnrega_yn': 0, 'mgnregaapp_yn': nan, 'mgnregawork_yn': nan, 'mgnrega_days': nan, '_56': 'No', '_57': nan, '_58': nan, 'scheme_memb_Scholarship': nan, '_60': nan, 'vill_name': 'Cc2bf74e-5Baa-4A23-Ad9d-21Fef4517f41', 'census_village_sd_2011': 'Shindgavhan', 'census_district_2011': 'Nandurbar', 'vill_name_taluka_code': 3954, 'census_subdistrict_2011': 'Nandurbar', 'subdistrict_code': 3954, 'vill_name_gp_code': 182276, 'vill_name_taluka_name': 'Nandurbar', 'district_code': 497, 'vill_name_gp_name': 'Shidgavahan', 'vill_name_village_code': 525705, 'location_latitude': 21.4179219, 'location_longitude': 74.354738, 'location_accuracy': 3, 'hoh': 'Suresh Kautik Patil', 'contact': 'In(+91)-9374060682', 'informant': 'Suresh Kautik Patil', 'rh': 'Self', 'rhoth': nan, 'hh_occu': 'Labourers', 'religion': 'Hindu', 'socialgrp': 'OBC', 'sc_health_id': 1, 'census_country': 'India', 'state_name': 'Maharashtra', 'state_code': 27, 'age_married': '<21 (For Boys)', 'village_code_census2011_raw': 525705, 'phase': 'Phase 3'}

如您所见,在读取行时,缺少列名Response ID。应该注意,df.iterrows()给了我所有正确的列。

我的csv文件中的前几行(包含标题)为here,因此可以将这个问题视为MVC。

1 个答案:

答案 0 :(得分:0)

事实证明df.to_dict('record') adds underscore to column names是整数字符串,it is broken when 255+ columns present

幸运的是,这两个问题现在都已在0.24.1错误修正版本中修复。