我正在尝试将7个.csv文件合并到1个文件中。我已经解决了大多数错误,但是现在有了KeyError:“ record_id”。 我是一个初学者,实际上我只是在这部分工作中使用Python。
我已经成功地对excel中的类似数据样本执行了此操作,但是所有工作表都在一个文件中。现在,我有了完整的数据集,并通过7个单独的csv文件将其提供给我。
我正在显示一个简化版本,其中前两个文件进行了初始合并:
import io as io
from io import StringIO
import pandas as pd
import numpy as np
import os, collections, csv
from os.path import basename
df = []
f0 = r'C:\PythonWorking\xxxData\SYLK_fix\FeMRExportDataFile0.csv'
f1 = r'C:\PythonWorking\xxxData\SYLK_fix\FeMRExportDataFile1.csv'
data_0 = pd.read_csv((io.StringIO(f0)))
data_1 = pd.read_csv((io.StringIO(f1)))
df = pd.merge(data_0, data_1, on='record_id', how='outer')
Final_csv = r'C:\PythonWorking\xxxData\SYLK_fix\FeMRExportDataFile0_6Merged.csv'
df.to_csv(Final_csv)
exit(0)
错误消息:
PS C:\Users\xxxx> & C:/Users/xxxx/AppData/Local/Programs/Python/Python37-32/python.exe c:/PythonWorking/xxxData/SampleNIJJoin_V4_CSVs.py
Traceback (most recent call last):
File "c:/PythonWorking/xxxhData/SampleNIJJoin_V4_CSVs.py", line 32, in <module>
df = pd.merge(data_0, data_1, on='record_id', how='outer')
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 81, in merge
validate=validate,
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 626, in __init__
) = self._get_merge_keys()
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 975, in _get_merge_keys
right_keys.append(right._get_label_or_level_values(rk))
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 1774, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'record_id'
我希望输出给我一个csv,它是在record_id上连接的所有变量的左连接。
数据的小样本:完成后,列(变量)总计将超过800。 See Data sample image