Question

我有两个CSV文件，每个文件包含我要合并到一个数据库中的不同列。我设置了一个包含两个文件中所有列的数据库，但是当我使用load data infile导入两个文件而不是合并时（IE数据文件1填充1-6列，数据文件2填充7-10）我得到一个数据库行数为两倍（每个CSV中每条记录占一行），NULLS填充未在源CSV中显示的数据。

我知道我可以通过以某种方式合并CSV，使用覆盖启用导入或合并数据库中的数据来解决这个问题 - 对我来说，最有效的方法是什么？

Answer 1

执行此操作的最佳方法是使用脚本。 CSV导入脚本通常用脚本语言编写，如python，ruby或php。

您只需要第二个CSV的导入程序对第一个CSV中创建的记录执行更新，因此该脚本实际上只有5到10行。如果您提供每个CSV的样本记录，我很乐意为您编写一个。

编辑：这是一个组合文件的python脚本，在file1的行和file2的行之间添加分号。这基本上可以执行Linux的paste命令。

lines1 = open('file1.txt').readlines()
lines2 = open('file2.txt').readlines()
outfile = open('outfile.txt', 'w')

if len(lines1) != len(lines2):
    raise Exception("Files need to be the same length, but file1 is %s lines long and file2 is %s lines long" % (len(lines1), len(lines2)));

for i in range(len(lines1)):
    combined = lines1[i].strip() + ";" + lines2[i].strip() + "\n"
    outfile.write(combined)

您可以将其保存为combine.py并输入python combine.py来运行它。您放入的文件夹应包含file1.txt，file2.txt和outfile.txt。

Answer 2

将两个CSV组合成一个。

如果您使用的是linux平台，请使用paste命令连接两个或多个文件。

PASTE(1)

NAME
       paste - merge lines of files

SYNOPSIS
       paste [OPTION]... [FILE]...

DESCRIPTION
       Write lines consisting of the sequentially corresponding lines from 
       each FILE, separated by TABs, to standard output.  
       With no FILE, or when FILE is -, read standard input.

       Mandatory arguments to long options are mandatory for short options too.

       -d, --delimiters=LIST
              reuse characters from LIST instead of TABs

       -s, --serial
              paste one file at a time instead of in parallel

       --help display this help and exit

       --version
              output version information and exit

，例如

paste file1.csv file2.csv > file3.csv

Answer 3

我会看看Perl和Text::CSV模块。您需要考虑的一个问题是两个文件中的数据顺序是否相同。

使用LOAD DATA INFILE将两个CSV文件加载到相同的行中

3 个答案: