我有一个大型csv数据文件,我想用列拆分。也就是说,一些指定的列进入一个部分,一些其他列进入另一个部分。我还希望能够创建超过2个部分。我怎么能在python中这样做?另外,python中是否有一个库来处理许多数据格式?
输入格式:
policyID statecode county eq_site_limit hu_site_limit fl_site_limit fr_site_limit tiv_2011 tiv_2012 eq_site_deductible hu_site_deductible fl_site_deductible fr_site_deductible point_latitude point_longitude line construction point_granularity
119736 FL CLAY COUNTY 498960 498960 498960 498960 498960 792148.9 0 9979.2 0 0 30.102261 -81.711777 Residential Masonry 1
448094 FL CLAY COUNTY 1322376.3 1322376.3 1322376.3 1322376.3 1322376.3 1438163.57 0 0 0 0 30.063936 -81.707664 Residential Masonry 3
206893 FL CLAY COUNTY 190724.4 190724.4 190724.4 190724.4 190724.4 192476.78 0 0 0 0 30.089579 -81.700455 Residential Wood 1
333743 FL CLAY COUNTY 0 79520.76 0 0 79520.76 86854.48 0 0 0 0 30.063236 -81.707703 Residential Wood 3
172534 FL CLAY COUNTY 0 254281.5 0 254281.5 254281.5 246144.49 0 0 0 0 30.060614 -81.702675 Residential Wood 1
输入格式列:
policyID statecode county eq_site_limit hu_site_limit fl_site_limit fr_site_limit tiv_2011 tiv_2012 eq_site_deductible hu_site_deductible fl_site_deductible fr_site_deductible point_latitude point_longitude line construction point_granularity
输出格式列:
A部分:['policyID', 'statecode', 'county', 'eq_site_limit', 'hu_site_limit']
B部分:['fl_site_limit', 'fr_site_limit', 'tiv_2011', 'tiv_2012', 'eq_site_deductible', 'hu_site_deductible', 'fl_site_deductible', 'fr_site_deductible', 'point_latitude', 'point_longitude', 'line', 'construction', 'point_granularity']
代码:
import csv
import pandas as pd
df = pd.read_csv("FL_insurance_sample.csv")
cl_list = list(df.columns.values)
a = cl_list[:5]
b = cl_list[5:]
with open('data1.csv', 'w') as datafile:
for x in a:
saved_column = df[x]
datafile.write(saved_column)
with open('data2.csv', 'w') as datafile:
for x in b:
saved_column = df[x]
datafile.write(saved_column)
答案 0 :(得分:2)
我假设您要将特定列从原始数据框拆分为新数据框,然后再转移到csv
。
如果这个假设不正确,请告诉我,因为答案是基于此。
好的,所以你读了csv
到pandas dataframe(df)
import csv
import pandas as pd
df = pd.read_csv("FL_insurance_sample.csv")
然后,根据您的需要创建一个新的df(在这里参加A部分)
>>> part_A = df.filter(['policyID', 'statecode', 'county', 'eq_site_limit', 'hu_site_limit'], axis=1)
>>> part_A
policyID statecode county eq_site_limit hu_site_limit
0 NaN NaN NaN NaN NaN
1 119736.0 FL CLAY COUNTY 498960.0 498960.00
2 448094.0 FL CLAY COUNTY 1322376.3 1322376.30
3 206893.0 FL CLAY COUNTY 190724.4 190724.40
4 333743.0 FL CLAY COUNTY 0.0 79520.76
5 172534.0 FL CLAY COUNTY 0.0 254281.50
将part_A df数据发送到csv
>>> part_A.to_csv("part_A.csv", index=False, encoding='utf-8')
同样为part_B创建一个新的df
>>> part_B = df.filter(['fl_site_limit', 'fr_site_limit', 'tiv_2011', 'tiv_2012', 'eq_site_deductible', 'hu_site_deductible', 'fl_site_deductible', 'fr_site_deductible', 'point_latitude', 'point_longitude', 'line', 'construction', 'point_granularity'], axis=1)
然后将part_B df发送到csv。
>>> part_B.to_csv("part_B.csv", index=False, encoding='utf-8')
因此,您可以根据需要拆分列,然后发送到csv
。
答案 1 :(得分:1)
要将任何列列表写入CSV文件,请使用函数to_csv()
:
df = pd.read_csv("FL_insurance_sample.csv")
df.iloc[:,:5].to_csv("data1.csv")
df.iloc[:,5:].to_csv("data2.csv")
如果您想直接传递列表:
df[a].to_csv("data1.csv")
df[b].to_csv("data2.csv")