我有一个csv文件,该文件需要拆分为两个csv文件(file1.csv和file2.csv)。应当进行拆分-基于“名称”列。需要将70%的行写入file1.csv,将其余30%的行写入file2.csv。例如,有10行名为“ AAA”。因此,10行中的70%意味着需要将“ AAA”的前7行写入file1.csv,然后将其后3行写入file2.csv。这样,“名称”列下的所有名称都需要发生这种情况。 如果获得十进制数字,则示例为0.7 x 9行= 6.3。然后前6行(四舍五入)到file1.csv,其余3行到file2.csv 如何使用Python代码完成此操作?谢谢https://fil.email/FPYB1RWd
答案 0 :(得分:1)
读取整个csv文件并将内容存储在列表中。然后将类似的csv数据存储在临时列表中。存储后,从列表中提取70%的数据并将其写入文件,然后将剩余的数据写入另一个文件。
csv_data = []
with open ('file.csv') as file:
csv_data.append(file.read())
csv_data = (''.join(csv_data)).split("\n")
header = csv_data[0]
csv_data = csv_data[1:]
temp_list = []
add_header = True
for i in csv_data:
if len(temp_list) == 0:
temp_list.append(i)
elif i.split(',')[0] == temp_list[0].split(',')[0]:
temp_list.append(i)
else:
file_length = len(temp_list)
line_count = int((0.7*file_length)+1)
if line_count == 1:
with open("file1.csv","a+") as file1:
if add_header:
add_header = False
file1.write(header+'\n')
file1.write(temp_list[0]+'\n')
else:
seventy_perc_lines = temp_list[:line_count]
thirty_perc_lines = temp_list[line_count:]
if add_header:
seventy_perc_lines.insert(0,header)
thirty_perc_lines.insert(0,header)
add_header = False
with open("file1.csv","a+") as file1:
for j in range(len(seventy_perc_lines)):
file1.write(seventy_perc_lines[j]+'\n')
if len(thirty_perc_lines) != 0:
with open("file2.csv","a+") as file2:
for j in range(len(thirty_perc_lines)):
file2.write(thirty_perc_lines[j]+'\n')
temp_list = []
temp_list.append(i)
file1.csv
file2.csv
注意:如果只有3行,此代码将在file1中添加所有3行,而对file2不添加任何内容。如果您想更改此行为,则需要编辑此代码。
答案 1 :(得分:0)
很简单:制作一个带有记录列表的字典,在主题上循环 根据两个输出文件之一列出并输出单个记录 遵循OP指定的简单规则。
from csv import reader, writer
inp0 = reader(open(...))
out1 = writer(open(..., 'w'))
out2 = writer(open(..., 'w'))
column, d = 0, {}
for rec in inp0: d.setdefault(rec[column], []).append(rec)
for recs in d.values():
l = round(0.7*len(recs))
for n, rec in enumerate(recs):
(out1 if n<l else out2).writerow(rec)
这里是使用IPyton会话对此方法进行的检查(仔细地 编辑以减少空白)和一些人工数据
17:22:~ $ ipython
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from csv import reader, writer
...: from random import randrange, seed
...: seed(20190712)
In [2]: data = [','.join(str(randrange(10)) for _ in range(4)) for _ in range(200)]
In [3]: inf = reader(data)
In [4]: of1 = writer(open('dele1', 'w')); of2 = writer(open('dele2', 'w'))
In [5]: d = {}
In [6]: for record in inf:
...: d.setdefault(record[0], []).append(record)
...: for key, records in d.items():
...: l1 = round(0.7*len(records))
...: for n, record in enumerate(records):
...: (of1 if n<l1 else of2).writerow(records)
In [7]: Ctrl-D
Do you really want to exit ([y]/n)?
17:23:~ $ wc -l dele?
140 dele1
60 dele2
200 total
17:24:~ $ rm dele?
17:24:~ $
如您所见,第一个文件获得原始记录的70%,而 第二个获得剩余的30%。
答案 2 :(得分:-1)
“”“ 将您的原始文件名替换为your_file_name 使用此功能,该功能将拆分csv文件并保存。 您可以更改拆分百分比以获取不同的文件大小 “”“
def split_csv("your_file_name.csv"):
import pandas as pd
df = pd.read_csv("your_file_name.csv")
split_percent = 0.7
df_length = int(len(df)*split_percent)
df1 = df.iloc[:df_length,:]
df2 = df.iloc[df_length:,:]
df1.to_csv("file1.csv")
df2.to_csv("file2.csv")