我要连接csv文件。我想要连接三个文件。列数相同,因此所需的输出也将具有相同的列
file1.csv
id ocr raw_value manual_raw_value
2a909d6e-5eb2-49a1-b6e8-171bf01dafdc ABBYY 7,20 7,20
0c93bc55-2c42-4afb-8428-c12736b0a86e ABBYY 44 44
12973bd0-72c7-4333-ac8d-ab0f18e33cb0 OMNIPAGE 4578 4578
2ccbeb11-ed0d-49a3-a321-a1c583764e3d ABBYY 1 1
f78636d7-c22f-4a85-bb34-fd8757352ec5 ABBYY 8040.56 8040.56
2d1c5869-1f87-4b47-bbf4-c2f31b122c0b OMNIPAGE 6 6
2d914f73-39d9-4108-8467-4d7a28933aa6 OMNIPAGE 0 0
文件2
id ocr raw_value manual_raw_value
bfa6c9f1-89c0-486b-a85c-a97a370a92c4 OMNIPAGE 35470 35470
213e1e1e-29df-44c2-acee-79f7fefa7ba9 OMNIPAGE Echeance Echéance
1ebecadc-056b-41c8-8426-446fff6bad71 OMNIPAGE Etoblissemenls Etablissements
35c1b736-f504-487d-a531-3b133045139e OMNIPAGE : :
009ee382-c1f2-4194-92e1-9b9fffd387d2 OMNIPAGE 1087 1087
35dd36a6-7c9c-4f81-b3f7-db6533bd159f OMNIPAGE 1 1
218f6860-b6aa-4bba-b64a-c2cc0b40812b OMNIPAGE HT HT
文件3
id ocr raw_value manual_raw_value
4a82a357-99e7-49e6-85b6-b2f6a27b8d5f OMNIPAGE Terms Terms
8b549fef-0cda-4af5-8239-35153c33ffbc OMNIPAGE price price
52ffe66a-b1ab-4b22-9b26-c298d53c951c OMNIPAGE Renseignements
Renseignements
507a0d96-9481-4b3f-8c35-f16588bedc0b OMNIPAGE pour pour
52e171dc-8d22-4162-b748-692b2fc11659 OMNIPAGE Client Client
c40a7e9f-1ec4-4cac-87e8-02ed0f335fe9 OMNIPAGE 5 5
4a936ed7-c082-4f46-9fa1-761a1525e2df OMNIPAGE SAS SAS
4b78130e-b099-400c-b7bf-6470e0519783 OMNIPAGE des des
4d5c6297-1c79-42f9-b4ea-929a9abfb3f7 OMNIPAGE 431 431
829d8bf5-b251-4bb1-82d8-0e912ab64e8e OMNIPAGE 59 59
期望的输出
id ocr raw_value manual_raw_value
2a909d6e-5eb2-49a1-b6e8-171bf01dafdc ABBYY 7,20 7,20
0c93bc55-2c42-4afb-8428-c12736b0a86e ABBYY 44 44
12973bd0-72c7-4333-ac8d-ab0f18e33cb0 OMNIPAGE 4578 4578
2ccbeb11-ed0d-49a3-a321-a1c583764e3d ABBYY 1 1
f78636d7-c22f-4a85-bb34-fd8757352ec5 ABBYY 8040.56 8040.56
2d1c5869-1f87-4b47-bbf4-c2f31b122c0b OMNIPAGE 6 6
2d914f73-39d9-4108-8467-4d7a28933aa6 OMNIPAGE 0 0
bfa6c9f1-89c0-486b-a85c-a97a370a92c4 OMNIPAGE 35470 35470
213e1e1e-29df-44c2-acee-79f7fefa7ba9 OMNIPAGE Echeance Echéance
1ebecadc-056b-41c8-8426-446fff6bad71 OMNIPAGE Etoblissemenls Etablissements
35c1b736-f504-487d-a531-3b133045139e OMNIPAGE : :
009ee382-c1f2-4194-92e1-9b9fffd387d2 OMNIPAGE 1087 1087
35dd36a6-7c9c-4f81-b3f7-db6533bd159f OMNIPAGE 1 1
218f6860-b6aa-4bba-b64a-c2cc0b40812b OMNIPAGE HT HT
4a82a357-99e7-49e6-85b6-b2f6a27b8d5f OMNIPAGE Terms Terms
8b549fef-0cda-4af5-8239-35153c33ffbc OMNIPAGE price price
52ffe66a-b1ab-4b22-9b26-c298d53c951c OMNIPAGE Renseignements
Renseignements
507a0d96-9481-4b3f-8c35-f16588bedc0b OMNIPAGE pour pour
52e171dc-8d22-4162-b748-692b2fc11659 OMNIPAGE Client Client
c40a7e9f-1ec4-4cac-87e8-02ed0f335fe9 OMNIPAGE 5 5
4a936ed7-c082-4f46-9fa1-761a1525e2df OMNIPAGE SAS SAS
4b78130e-b099-400c-b7bf-6470e0519783 OMNIPAGE des des
4d5c6297-1c79-42f9-b4ea-929a9abfb3f7 OMNIPAGE 431 431
829d8bf5-b251-4bb1-82d8-0e912ab64e8e OMNIPAGE 59 59
这是我的代码
import os
import glob
import pandas
def concatenate(indir,outputfile):
os.chdir(indir)
fileList=glob.glob("*.csv")
dfList=[]
colnames=["id","ocr","raw_value","manual_raw_value"]
for filename in fileList:
print(filename)
df=pandas.read_csv(input,header=None)
dfList.append(df)
concatenateDF=pandas.concat(dfList,axis=0)
concatenateDF.columns=colnames
concatenateDF.to_save(outputfile,index=None)
if __name__ == "__main__":
input="/files"
output="files/concatenated.csv"
concatenate(input,output)
我收到以下错误:
Traceback (most recent call last):
File "/home/ahmed/crnn/concatenate_files.py", line 21, in <module>
concatenate(input,output)
File "/home/ahmed/crnn/concatenate_files.py", line 12, in concatenate
df=pandas.read_csv(input,header=None)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/io/parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/io/parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/io/parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/io/parsers.py", line 1390, in __init__
words2.csv
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 538, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:6171)
pandas.io.common.EmptyDataError: No columns to parse from file
我的代码出了什么问题?
我该如何解决?
谢谢
答案 0 :(得分:1)
你的csv间距不均匀。这可以很容易地处理。在读取数据时,请设置def round(x: Double)(p: Int): Double = {
var A = x.toString().split('.')
(A(0) + "." + A(1).substring(0, if (p > A(1).length()) A(1).length() else p)).toDouble
}
。
delim_whitespace=True
已打印每个数据框的大小。接下来,我们将按照您的方式连接它们,然后在最后分配列:
In [1335]: list_ = []
...: for file in glob.glob('*.csv'):
...: df = pd.read_csv(file, index_col=None, header=0, delim_whitespace=True)
...: print('Size:', len(df))
...: list_.append(df)
...:
Size: 7
Size: 7
Size: 10
确认它们已正确连接:
In [1336]: df = pd.concat(list_, axis=0)
In [1337]: df.columns = ["id", "ocr", "raw_value", "manual_raw_value"]
In [1338]: df.head()
Out[1338]:
id ocr raw_value manual_raw_value
0 2a909d6e-5eb2-49a1-b6e8-171bf01dafdc ABBYY 7,20 7,20
1 0c93bc55-2c42-4afb-8428-c12736b0a86e ABBYY 44 44
2 12973bd0-72c7-4333-ac8d-ab0f18e33cb0 OMNIPAGE 4578 4578
3 2ccbeb11-ed0d-49a3-a321-a1c583764e3d ABBYY 1 1
4 f78636d7-c22f-4a85-bb34-fd8757352ec5 ABBYY 8040.56 8040.56
In [1339]: len(df)
Out[1339]: 24
,所以你很高兴。
答案 1 :(得分:0)
您的错误不是由于此处提到的连接造成的:
File "/home/ahmed/crnn/concatenate_files.py", line 12, in concatenate
df=pandas.read_csv(input,header=None)
文件中的列间距不等,列之间的间距必须一致。这个问题与你得到的错误无关。我想,你必须从一些文本中复制出数据框的打印件。
例如,在file1.csv中,raw_value和manual_value列之间的空格数
第1行是9
2a909d6e-5eb2-49a1-b6e8-171bf01dafdc ABBYY 7,20 7,20
第2行中的是12
0c93bc55-2c42-4afb-8428-c12736b0a86e ABBYY 44 44