通过to_csv将整个列保存在pandas中

时间:2015-02-27 10:38:31

标签: python csv pandas

我有一个带有以下值的2列数据帧,不幸的是,由于值列表太长,在使用to_csv保存到csv时会被, ...]切断,如何保留整个列表保存

  

attribute_name list_of_values attribute1 [1320698,1320699,1096323,   1320690,1839190,1091359,1325750,1569072,1829679,142100,1320163,   1829673,588914,418137,757085,588910,1321158,1073897,1823533,   1823535,1091363,1383908,1834826,36191,1829641,767536,1829597,   1829591,1326727,1834700,1317721,1317802,1834838,52799,1383915,   1320042,1829654,1829655,1829658,647089,1829581,1829586,1829587,   1321116,1829585,1829588,18379799,1588509,1834471,1793632,   1327850,1793599,14566868,1315869,1793605,1321236,1829579,   1829577,1793609,1829571,1829570,1320139,777057,1829671,1829566,   1831047,1829567,588927,60484,1793596,182966,1839580,1829569,   1793615,1323529,1793619,1834758,1612974,1320007,1839780,   1291475,1834835,1834453,1823663,418112,1092106,18296689,1829688,   1793606,647050,1834742,1839551,1839553,1834746,1839556,1834745,   1575978,1834749,1320711,1317910,...]

df.to_csv(loc,index=False,header=False,sep='\t',mode='a',encoding='utf8')。 我在这里尝试了显示选项,http://pandas.pydata.org/pandas-docs/dev/options.htmlpd.set_option('max_colwidth',20000000000),但我认为因为它仅适用于显示模式,而不是转储到csv,这不起作用。

我还可以设置什么,以保留整个列表的内容。

修改 - :尝试使用此 orignal数据创建数据框,保存后,它将为您提供如上所述的扭曲数据。

    import pandas as pd
     pd.options.display.multi_sparse = False
     pd.set_option('max_colwidth',2000000000000000000)
     headers=["attribute_name", "list_of_values"]
     file_name='/home/ekta/abcd.csv'
     data = ['attribute1', ['1320698', '1320699', '1096323', '1320690', '1839190', '1091359', '1325750', '1569072', '1829679', '142100', '1320163', '1829673', '588914', '418137', '757085', '588910', '1321158', '1073897', '1823533', '1823535', '1091363', '1383908', '1834826', '36191', '1829641', '767536', '1829597', '1829591', '1326727', '1834700', '1317721', '1317802', '1834838', '52799', '1383915', '1320042', '1829654', '1829655', '1829658', '647089', '1829581', '1829586', '1829587', '1321116', '1829585', '1829588', '1839799', '1588509', '1834471', '1793632', '1327850', '1793599', '1456968', '1315869', '1793605', '1321236', '1829579', '1829577', '1793609', '1829571', '1829570', '1320139', '777057', '1829671', '1829566', '1831047', '1829567', '588927', '60484', '1793596', '1829634', '1839580', '1829569', '1793615', '1323529', '1793619', '1834758', '1612974', '1320007', '1839780', '1291475', '1834835', '1834453', '1823663', '418112', '1092106', '1829689', '1829688', '1793606', '647050', '1834742', '1839551', '1839553', '1834746', '1839556', '1834745', '1575978', '1834749', '1320711', '1317910', '1829700', '1839791', '1839796', '1320019', '1829494', '437131', '1829696', '1839576', '721318', '1829699', '1838874', '1315822', '647049', '1325775', '1320708', '133913', '835588', '1839564', '1320700', '1320707', '1839563', '1834737', '1834736', '1834734', '1823669', '1321159', '1320577', '1839768', '1823665', '1838602', '1823667', '1321099', '1753590', '1753593', '1320688', '1839583', '1326633', '1320681', '1793646', '1323683', '1091348', '982081', '1793648', '1478516', '1317650', '1829663', '1829667', '1829666', '1793640', '1839577', '1315855', '1317796', '1839775', '1321163', '1793642']]


    def write_file(data,flag,headers,file_name):
    # open a df & write recursively
    print " \n \n data", data
    df = pd.DataFrame(data).T
    print "df \n", df
    # write to a df recursively
    loc=file_name
    #loc="%s%s_%s"%(args.base_path,args.merchant_domain,file_name)
    if flag ==True :
        df.to_csv(loc,index=False,header=headers,sep='\t',mode='a',encoding='utf8')
        flag = False
    elif flag == False :
        df.to_csv(loc,index=False,header=False,sep='\t',mode='a',encoding='utf8')
    return loc
    # I call the function above with this data & headers, I pass flag as "True" the 1st time around, after which I write recursively with flag=False.
    write_file(data,flag=True,headers,file_name)

调试: 原始列表的长度为155,保存为to_csv的失真列表有100个数据点。

loc&的目的标志:文件位置& flag =表示我是写第1行还是第2行onwrads如果第1行已经写好,我不需要再次写标题。

以下是我解决的方法 主要的诊断是我无法存储我传递的整个列表,即使我将其视为dict对象,可能是因为pandas处理列长度的方式,但这只是诊断。 通过编写文件,而不是使用to_csv(pandas),我将整个列表归还,但是将其写成一个简单的文件,然后用pandas读回来,在这种情况下,我可以恢复整个文件。

import pandas as pd
# Note that I changed my headers from the initial format as a list
headers="attribute_name\tlist_of_values"

data = ['attribute1',['1320698', '1320699', '1096323', '1320690', '1839190', '1091359', '1325750', '1569072', '1829679', '142100', '1320163', '1829673', '588914', '418137', '757085', '588910', '1321158', '1073897', '1823533', '1823535', '1091363', '1383908', '1834826', '36191', '1829641', '767536', '1829597', '1829591', '1326727', '1834700', '1317721', '1317802', '1834838', '52799', '1383915', '1320042', '1829654', '1829655', '1829658', '647089', '1829581', '1829586', '1829587', '1321116', '1829585', '1829588', '1839799', '1588509', '1834471', '1793632', '1327850', '1793599', '1456968', '1315869', '1793605', '1321236', '1829579', '1829577', '1793609', '1829571', '1829570', '1320139', '777057', '1829671', '1829566', '1831047', '1829567', '588927', '60484', '1793596', '1829634', '1839580', '1829569', '1793615', '1323529', '1793619', '1834758', '1612974', '1320007', '1839780', '1291475', '1834835', '1834453', '1823663', '418112', '1092106', '1829689', '1829688', '1793606', '647050', '1834742', '1839551', '1839553', '1834746', '1839556', '1834745', '1575978', '1834749', '1320711', '1317910', '1829700', '1839791', '1839796', '1320019', '1829494', '437131', '1829696', '1839576', '721318', '1829699', '1838874', '1315822', '647049', '1325775', '1320708', '133913', '835588', '1839564', '1320700', '1320707', '1839563', '1834737', '1834736', '1834734', '1823669', '1321159', '1320577', '1839768', '1823665', '1838602', '1823667', '1321099', '1753590', '1753593', '1320688', '1839583', '1326633', '1320681', '1793646', '1323683', '1091348', '982081', '1793648', '1478516', '1317650', '1829663', '1829667', '1829666', '1793640', '1839577', '1315855', '1317796', '1839775', '1321163', '1793642']]
flag=True
# write to a file
with open(loc, 'a') as f:
    if flag :
        f.write(headers+"\n")
        flag=False
    #Explicitly writing a tab separated file
    f.write(str(data[0])+"\t"+str(data[1])+"\n")

# read the file & confirm
df=pd.read_csv(loc,sep='\t',header='infer')
print df['list_of_values'].ix[0]
print len(df['list_of_values'].ix[0])
#Yah !! 155

感谢@paul诊断出这个问题&我指出了这个方向。

0 个答案:

没有答案