我有一个带有以下值的2列数据帧,不幸的是,由于值列表太长,在使用to_csv保存到csv时会被, ...]
切断,如何保留整个列表保存
attribute_name list_of_values attribute1 [1320698,1320699,1096323, 1320690,1839190,1091359,1325750,1569072,1829679,142100,1320163, 1829673,588914,418137,757085,588910,1321158,1073897,1823533, 1823535,1091363,1383908,1834826,36191,1829641,767536,1829597, 1829591,1326727,1834700,1317721,1317802,1834838,52799,1383915, 1320042,1829654,1829655,1829658,647089,1829581,1829586,1829587, 1321116,1829585,1829588,18379799,1588509,1834471,1793632, 1327850,1793599,14566868,1315869,1793605,1321236,1829579, 1829577,1793609,1829571,1829570,1320139,777057,1829671,1829566, 1831047,1829567,588927,60484,1793596,182966,1839580,1829569, 1793615,1323529,1793619,1834758,1612974,1320007,1839780, 1291475,1834835,1834453,1823663,418112,1092106,18296689,1829688, 1793606,647050,1834742,1839551,1839553,1834746,1839556,1834745, 1575978,1834749,1320711,1317910,...]
df.to_csv(loc,index=False,header=False,sep='\t',mode='a',encoding='utf8')
。
我在这里尝试了显示选项,http://pandas.pydata.org/pandas-docs/dev/options.html和pd.set_option('max_colwidth',20000000000)
,但我认为因为它仅适用于显示模式,而不是转储到csv,这不起作用。
我还可以设置什么,以保留整个列表的内容。
修改 - :尝试使用此 orignal数据创建数据框,保存后,它将为您提供如上所述的扭曲数据。
import pandas as pd
pd.options.display.multi_sparse = False
pd.set_option('max_colwidth',2000000000000000000)
headers=["attribute_name", "list_of_values"]
file_name='/home/ekta/abcd.csv'
data = ['attribute1', ['1320698', '1320699', '1096323', '1320690', '1839190', '1091359', '1325750', '1569072', '1829679', '142100', '1320163', '1829673', '588914', '418137', '757085', '588910', '1321158', '1073897', '1823533', '1823535', '1091363', '1383908', '1834826', '36191', '1829641', '767536', '1829597', '1829591', '1326727', '1834700', '1317721', '1317802', '1834838', '52799', '1383915', '1320042', '1829654', '1829655', '1829658', '647089', '1829581', '1829586', '1829587', '1321116', '1829585', '1829588', '1839799', '1588509', '1834471', '1793632', '1327850', '1793599', '1456968', '1315869', '1793605', '1321236', '1829579', '1829577', '1793609', '1829571', '1829570', '1320139', '777057', '1829671', '1829566', '1831047', '1829567', '588927', '60484', '1793596', '1829634', '1839580', '1829569', '1793615', '1323529', '1793619', '1834758', '1612974', '1320007', '1839780', '1291475', '1834835', '1834453', '1823663', '418112', '1092106', '1829689', '1829688', '1793606', '647050', '1834742', '1839551', '1839553', '1834746', '1839556', '1834745', '1575978', '1834749', '1320711', '1317910', '1829700', '1839791', '1839796', '1320019', '1829494', '437131', '1829696', '1839576', '721318', '1829699', '1838874', '1315822', '647049', '1325775', '1320708', '133913', '835588', '1839564', '1320700', '1320707', '1839563', '1834737', '1834736', '1834734', '1823669', '1321159', '1320577', '1839768', '1823665', '1838602', '1823667', '1321099', '1753590', '1753593', '1320688', '1839583', '1326633', '1320681', '1793646', '1323683', '1091348', '982081', '1793648', '1478516', '1317650', '1829663', '1829667', '1829666', '1793640', '1839577', '1315855', '1317796', '1839775', '1321163', '1793642']]
def write_file(data,flag,headers,file_name):
# open a df & write recursively
print " \n \n data", data
df = pd.DataFrame(data).T
print "df \n", df
# write to a df recursively
loc=file_name
#loc="%s%s_%s"%(args.base_path,args.merchant_domain,file_name)
if flag ==True :
df.to_csv(loc,index=False,header=headers,sep='\t',mode='a',encoding='utf8')
flag = False
elif flag == False :
df.to_csv(loc,index=False,header=False,sep='\t',mode='a',encoding='utf8')
return loc
# I call the function above with this data & headers, I pass flag as "True" the 1st time around, after which I write recursively with flag=False.
write_file(data,flag=True,headers,file_name)
调试: 原始列表的长度为155,保存为to_csv的失真列表有100个数据点。
loc&的目的标志:文件位置& flag =表示我是写第1行还是第2行onwrads如果第1行已经写好,我不需要再次写标题。
以下是我解决的方法 主要的诊断是我无法存储我传递的整个列表,即使我将其视为dict对象,可能是因为pandas处理列长度的方式,但这只是诊断。 通过编写文件,而不是使用to_csv(pandas),我将整个列表归还,但是将其写成一个简单的文件,然后用pandas读回来,在这种情况下,我可以恢复整个文件。
import pandas as pd
# Note that I changed my headers from the initial format as a list
headers="attribute_name\tlist_of_values"
data = ['attribute1',['1320698', '1320699', '1096323', '1320690', '1839190', '1091359', '1325750', '1569072', '1829679', '142100', '1320163', '1829673', '588914', '418137', '757085', '588910', '1321158', '1073897', '1823533', '1823535', '1091363', '1383908', '1834826', '36191', '1829641', '767536', '1829597', '1829591', '1326727', '1834700', '1317721', '1317802', '1834838', '52799', '1383915', '1320042', '1829654', '1829655', '1829658', '647089', '1829581', '1829586', '1829587', '1321116', '1829585', '1829588', '1839799', '1588509', '1834471', '1793632', '1327850', '1793599', '1456968', '1315869', '1793605', '1321236', '1829579', '1829577', '1793609', '1829571', '1829570', '1320139', '777057', '1829671', '1829566', '1831047', '1829567', '588927', '60484', '1793596', '1829634', '1839580', '1829569', '1793615', '1323529', '1793619', '1834758', '1612974', '1320007', '1839780', '1291475', '1834835', '1834453', '1823663', '418112', '1092106', '1829689', '1829688', '1793606', '647050', '1834742', '1839551', '1839553', '1834746', '1839556', '1834745', '1575978', '1834749', '1320711', '1317910', '1829700', '1839791', '1839796', '1320019', '1829494', '437131', '1829696', '1839576', '721318', '1829699', '1838874', '1315822', '647049', '1325775', '1320708', '133913', '835588', '1839564', '1320700', '1320707', '1839563', '1834737', '1834736', '1834734', '1823669', '1321159', '1320577', '1839768', '1823665', '1838602', '1823667', '1321099', '1753590', '1753593', '1320688', '1839583', '1326633', '1320681', '1793646', '1323683', '1091348', '982081', '1793648', '1478516', '1317650', '1829663', '1829667', '1829666', '1793640', '1839577', '1315855', '1317796', '1839775', '1321163', '1793642']]
flag=True
# write to a file
with open(loc, 'a') as f:
if flag :
f.write(headers+"\n")
flag=False
#Explicitly writing a tab separated file
f.write(str(data[0])+"\t"+str(data[1])+"\n")
# read the file & confirm
df=pd.read_csv(loc,sep='\t',header='infer')
print df['list_of_values'].ix[0]
print len(df['list_of_values'].ix[0])
#Yah !! 155
感谢@paul诊断出这个问题&我指出了这个方向。