Question

我有一个大型csv数据文件，我想用列拆分。也就是说，一些指定的列进入一个部分，一些其他列进入另一个部分。我还希望能够创建超过2个部分。我怎么能在python中这样做？另外，python中是否有一个库来处理许多数据格式？

输入格式：

policyID statecode county eq_site_limit hu_site_limit fl_site_limit fr_site_limit tiv_2011 tiv_2012 eq_site_deductible hu_site_deductible fl_site_deductible fr_site_deductible point_latitude point_longitude line construction point_granularity

119736 FL CLAY COUNTY 498960 498960 498960 498960 498960 792148.9 0 9979.2 0 0 30.102261 -81.711777 Residential Masonry 1
448094 FL CLAY COUNTY 1322376.3 1322376.3 1322376.3 1322376.3 1322376.3 1438163.57 0 0 0 0 30.063936 -81.707664 Residential Masonry 3
206893 FL CLAY COUNTY 190724.4 190724.4 190724.4 190724.4 190724.4 192476.78 0 0 0 0 30.089579 -81.700455 Residential Wood 1
333743 FL CLAY COUNTY 0 79520.76 0 0 79520.76 86854.48 0 0 0 0 30.063236 -81.707703 Residential Wood 3
172534 FL CLAY COUNTY 0 254281.5 0 254281.5 254281.5 246144.49 0 0 0 0 30.060614 -81.702675 Residential Wood 1

输入格式列：

policyID statecode county eq_site_limit hu_site_limit fl_site_limit fr_site_limit tiv_2011 tiv_2012 eq_site_deductible hu_site_deductible fl_site_deductible fr_site_deductible point_latitude point_longitude line construction point_granularity

输出格式列：

A部分：['policyID', 'statecode', 'county', 'eq_site_limit', 'hu_site_limit']

B部分：['fl_site_limit', 'fr_site_limit', 'tiv_2011', 'tiv_2012', 'eq_site_deductible', 'hu_site_deductible', 'fl_site_deductible', 'fr_site_deductible', 'point_latitude', 'point_longitude', 'line', 'construction', 'point_granularity']

代码：

import csv
import pandas as pd

df = pd.read_csv("FL_insurance_sample.csv")
cl_list = list(df.columns.values)
a = cl_list[:5]
b = cl_list[5:]

with open('data1.csv', 'w') as datafile:
    for x in a:
        saved_column = df[x]
        datafile.write(saved_column)

with open('data2.csv', 'w') as datafile:
    for x in b:
        saved_column = df[x]
        datafile.write(saved_column)

Answer 1

我假设您要将特定列从原始数据框拆分为新数据框，然后再转移到csv。
如果这个假设不正确，请告诉我，因为答案是基于此。

好的，所以你读了csv到pandas dataframe（df）

import csv
import pandas as pd

df = pd.read_csv("FL_insurance_sample.csv")

然后，根据您的需要创建一个新的df（在这里参加A部分）

>>> part_A = df.filter(['policyID', 'statecode', 'county', 'eq_site_limit', 'hu_site_limit'], axis=1)

>>> part_A
   policyID statecode       county  eq_site_limit  hu_site_limit
0       NaN       NaN          NaN            NaN            NaN
1  119736.0        FL  CLAY COUNTY       498960.0      498960.00
2  448094.0        FL  CLAY COUNTY      1322376.3     1322376.30
3  206893.0        FL  CLAY COUNTY       190724.4      190724.40
4  333743.0        FL  CLAY COUNTY            0.0       79520.76
5  172534.0        FL  CLAY COUNTY            0.0      254281.50

将part_A df数据发送到csv

>>> part_A.to_csv("part_A.csv", index=False, encoding='utf-8')

同样为part_B创建一个新的df

>>> part_B = df.filter(['fl_site_limit', 'fr_site_limit', 'tiv_2011', 'tiv_2012', 'eq_site_deductible', 'hu_site_deductible', 'fl_site_deductible', 'fr_site_deductible', 'point_latitude', 'point_longitude', 'line', 'construction', 'point_granularity'], axis=1)

然后将part_B df发送到csv。

>>> part_B.to_csv("part_B.csv", index=False, encoding='utf-8')

因此，您可以根据需要拆分列，然后发送到csv。

Answer 2

要将任何列列表写入CSV文件，请使用函数to_csv()：

df = pd.read_csv("FL_insurance_sample.csv")

df.iloc[:,:5].to_csv("data1.csv")
df.iloc[:,5:].to_csv("data2.csv")

如果您想直接传递列表：

df[a].to_csv("data1.csv")
df[b].to_csv("data2.csv")

如何使用列名将csv拆分为多个部分？

2 个答案: