我正在尝试合并两个文件,我正在为它们提供标题,因为当我使用concatenate合并它们时它们无法获取标题,当我尝试删除列时出现错误...... ValueError:标签[' lh.aparc.a2009s.meancurv']未包含在轴中 因此,我正在尝试以下方法.....
标题很重要,因为我想在这些标题的基础上计算平均值,平均值等....
但目前,结果文件看起来像this
CSV 1 CSV1 looks like this CSV 2与rh
看起来相同# !/bin/bash
ls -d */ | sed -e "s/\///g" | grep -v "Results" | grep -v "Output">> subjects.txt;
module unload freesurfer
module load freesurfer/5.3.0
module load python
export SUBJECTS_DIR=/N/u/shrechak/Karst/GENFL_FREESURFER53_KARST_RES
source $FREESURFER_HOME/FreeSurferEnv.sh
aparcstats2table --hemi lh --subjectsfile=subjects.txt --parc aparc.a2009s --meas meancurv --tablefile lh.a2009s.meancurv.txt
aparcstats2table --hemi rh --subjectsfile=subjects.txt --parc aparc.a2009s --meas meancurv --tablefile rh.a2009s.meancurv.txt
for f in *.txt; do
mv "$f" "${f%.txt}.csv"
done
python <<END_OF_PYTHON
import csv
import pandas as pd
names= ["meancurv",
"lh_G_and_S_frontomargin_meancurv",
"lh_G_and_S_occipital_inf_meancurv",
"lh_G_and_S_paracentral_meancurv",
"lh_G_and_S_subcentral_meancurv",
"lh_G_and_S_transv_frontopol_meancurv",
"lh_G_and_S_cingul-ant_meancurv",
"lh_G_and_S_cingul-Mid-Ant_meancurv",
"lh_G_and_S_cingul-Mid-Post_meancurv",
"lh_G_cingul-Post-dorsal_meancurv",
"lh_G_cingul-Post-ventral_meancurv",
"lh_G_cuneus_meancurv",
"lh_G_front_inf-Opercular_meancurv",
"lh_G_front_inf-orbital_meancurv",
"lh_G_front_inf-Triangul_meancurv",
"lh_G_front_middle_meancurv",
"lh_G_front_sup_meancurv",
"lh_G_Ins_lg_and_S_cent_ins_meancurv",
"lh_G_insular_short_meancurv",
"lh_G_occipital_middle_meancurv",
"lh_G_occipital_sup_meancurv",
"lh_G_oc-temp_lat-fusifor_meancurv",
"lh_G_oc-temp_med-Lingual_meancurv",
"lh_G_oc-temp_med-Parahip_meancurv",
"lh_G_orbital_meancurv",
"lh_G_pariet_infoangular_meancurv",
"lh_G_pariet_infSupramar_meancurv",
"lh_G_parietal_sup_meancurv",
"lh_G_postcentral_meancurv",
"lh_G_precentral_meancurv",
"lh_G_precuneus_meancurv",
"lh_G_rectus_meancurv",
"lh_G_subcallosal_meancurv",
"lh_G_temp_sup-G_T_transv_meancurv",
"lh_G_temp_sup-Lateral_meancurv",
"lh_G_temp_sup-Plan_polar_meancurv",
"lh_G_temp_supPlan_tempo_meancurv",
"lh_G_temporal_inf_meancurv",
"lh_G_temporal_middle_meancurv",
"lh_Lat_Fis-ant-Horizont_meancurv",
"lh_Lat_Fis-ant-Vertical_meancurv",
"lh_Lat_Fispost_meancurv",
"lh_Pole_occipital_meancurv",
"lh_Pole_temporal_meancurv",
"lh_S_calcarine_meancurv",
"lh_S_central_meancurv",
"lh_S_cingulMarginalis_meancurv",
"lh_S_circular_insula_ant_meancurv",
"lh_S_circular_insula_inf_meancurv",
"lh_S_circular_insula_sup_meancurv",
"lh_S_collat_transv_ant_meancurv",
"lh_S_collat_transv_post_meancurv",
"lh_S_front_inf_meancurv",
"lh_S_front_middle_meancurv",
"lh_S_front_sup_meancurv",
"lh_S_interm_prim-Jensen_meancurv",
"lh_S_intrapariet_and_P_trans_meancurv",
"lh_S_oc_middle_and_Lunatus_meancurv",
"lh_S_oc_sup_and_transversal_meancurv",
"lh_S_occipital_ant_meancurv",
"lh_S_oc-temp_lat_meancurv",
"lh_S_oc-temp_med_and_Lingual_meancurv",
"lh_S_orbital_lateral_meancurv",
"lh_S_orbital_med-olfact_meancurv",
"lh_S_orbital-H_Shaped_meancurv",
"lh_S_parieto_occipital_meancurv",
"lh_S_pericallosal_meancurv",
"lh_S_postcentral_meancurv",
"lh_S_precentral-inf-part_meancurv",
"lh_S_precentral-sup-part_meancurv",
"lh_S_suborbital_meancurv",
"lh_S_subparietal_meancurv",
"lh_S_temporal_inf_meancurv",
"lh_S_temporal_sup_meancurv",
"lh_S_temporal_transverse_meancurv"]
df1 = pd.read_csv('lh.a2009s.meancurv.csv', header = None, names = names)
names1 = ["meancurv",
"rh_G_and_S_frontomargin_meancurv",
"rh_G_and_S_occipital_inf_meancurv",
"rh_G_and_S_paracentral_meancurv",
"rh_G_and_S_subcentral_meancurv",
"rh_G_and_S_transv_frontopol_meancurv",
"rh_G_and_S_cingul-Ant_meancurv",
"rh_G_and_S_cingul-Mid-Ant_meancurv",
"rh_G_and_S_cingul-Mid-Post_meancurv",
"rh_G_cingul-Post-dorsal_meancurv",
"rh_G_cingul-Post-ventral_meancurv",
"rh_G_cuneus_meancurv",
"rh_G_front_inf-Opercular_meancurv",
"rh_G_front_inf-Orbital_meancurv",
"rh_G_front_inf-Triangul_meancurv",
"rh_G_front_middle_meancurv",
"rh_G_front_sup_meancurv",
"rh_G_Ins_lg_and_S_cent_ins_meancurv",
"rh_G_insular_short_meancurv",
"rh_G_occipital_middle_meancurv",
"rh_G_occipital_sup_meancurv",
"rh_G_oc-temp_lat-fusifor_meancurv",
"rh_G_oc-temp_med-Lingual_meancurv",
"rh_G_oc-temp_med-Parahip_meancurv",
"rh_G_orbital_meancurv",
"rh_G_pariet_inf-Angular_meancurv",
"rh_G_pariet_inf-Supramar_meancurv",
"rh_G_parietal_sup_meancurv",
"rh_G_postcentral_meancurv",
"rh_G_precentral_meancurv",
"rh_G_precuneus_meancurv",
"rh_G_rectus_meancurv",
"rh_G_subcallosal_meancurv",
"rh_G_temp_sup-G_T_transv_meancurv",
"rh_G_temp_sup-Lateral_meancurv",
"rh_G_temp_sup-Plan_polar_meancurv",
"rh_G_temp_sup-Plan_tempo_meancurv",
"rh_G_temporal_inf_meancurv",
"rh_G_temporal_middle_meancurv",
"rh_Lat_Fis-ant-Horizont_meancurv",
"rh_Lat_Fis-ant-Vertical_meancurv",
"rh_Lat_Fis-post_meancurv",
"rh_Pole_occipital_meancurv",
"rh_Pole_temporal_meancurv",
"rh_S_calcarine_meancurv",
"rh_S_central_meancurv",
"rh_S_cingulMarginalis_meancurv",
"rh_S_circular_insula_ant_meancurv",
"rh_S_circular_insula_inf_meancurv",
"rh_S_circular_insula_sup_meancurv",
"rh_S_collat_transv_ant_meancurv",
"rh_S_collat_transv_post_meancurv",
"rh_S_front_inf_meancurv",
"rh_S_front_middle_meancurv",
"rh_S_front_sup_meancurv",
"rh_S_interm_prim-Jensen_meancurv",
"rh_S_intrapariet_and_P_trans_meancurv",
"rh_S_oc_middle_and_Lunatus_meancurv",
"rh_S_oc_sup_and_transversal_meancurv",
"rh_S_occipital_ant_meancurv",
"rh_S_oc-temp_lat_meancurv",
"rh_S_oc-temp_med_and_Lingual_meancurv",
"rh_S_orbital_lateral_meancurv",
"rh_S_orbital_med-olfact_meancurv",
"rh_S_orbital-H_Shaped_meancurv",
"rh_S_parieto_occipital_meancurv",
"rh_S_pericallosal_meancurv",
"rh_S_postcentral_meancurv",
"rh_S_precentral-inf-part_meancurv",
"rh_S_precentral-sup-part_meancurv",
"rh_S_suborbital_meancurv",
"rh_S_subparietal_meancurv",
"rh_S_temporal_inf_meancurv",
"rh_S_temporal_sup_meancurv",
"rh_S_temporal_transverse_meancurv"
]
df2 = pd.read_csv('rh.a2009s.meancurv.csv', header = None, names = names1)
result = pd.merge(df1, df2, on='meancurv', how='outer')
result.to_csv('result.csv')
END_OF_PYTHON
echo "goodbye!";
答案 0 :(得分:0)
所以你想跳过第一行,只拉数据部分。
这是一个MCVE。
<强>代码:强>
import io
import pandas as pd
csv1 = io.StringIO(u'''
a,b,c
1,4,7
2,5,8
3,6,9
''')
df = pd.read_csv(csv1, names = ['d','e','f'], skiprows = [1])
print df
<强>输出:强>
d e f
0 1 4 7
1 2 5 8
2 3 6 9
答案 1 :(得分:0)
这是一种方法,您可以将两个文件合并在一起文件,以便在合并后保留其中一个文件的标题。
假设您将文件保存在列表'文件'中:
files = ['file1.csv', 'file2.csv'] #keep files here
finalDF = pd.DataFrame() #this is an empty dataframe
for file in files:
thisDF = pd.read_csv(file)
finalDF = finalDF.append(thisDF, ignore_index=True)
现在,如果你想尝试这两行:
说您想使用简单的打印头()
检查标题print finalDF.head()
如果您想将此合并数据框写入csv文件
finalDF.to_csv('merged-file.csv', encoding="utf-8", index=False)
file1.csv:
,column1,column2,column3,column4,Date,Device,sample_site
2,14888,0.060011931,248084,13.40535464,3/15/2017,DESKTOP,http://www.example1.com
11,1358,0.033212679,40888,7.465099785,3/15/2017,MOBILE,http://www.example2.com
23,130,0.02998155,4336,8.337638376,3/15/2017,TABLET,http://www.example3.com
file2.csv:
,column1,column2,column3,column4,Date,Device,sample_site
35,2685,0.034564882,77680,10.97812822,3/15/2017,DESKTOP,https://www.example4.com
45,280,0.026197605,10688,7.801272455,3/15/2017,MOBILE,https://www.example5.com
54,24,0.022878932,1049,8.202097235,3/15/2017,TABLET,https://www.example6.com
<强>合并-FILE.CSV:强>
Unnamed: 0,column1,column2,column3,column4,Date,Device,sample_site
2,14888,0.060011931,248084,13.40535464,3/15/2017,DESKTOP,http://www.example1.com
11,1358,0.033212679,40888,7.465099785,3/15/2017,MOBILE,http://www.example2.com
23,130,0.02998155,4336,8.337638376,3/15/2017,TABLET,http://www.example3.com
35,2685,0.034564882,77680,10.97812822,3/15/2017,DESKTOP,https://www.example4.com
45,280,0.026197605,10688,7.801272455,3/15/2017,MOBILE,https://www.example5.com
54,24,0.022878932,1049,8.202097235,3/15/2017,TABLET,https://www.example6.com
回复:
您是否尝试根据列合并数据?在这种情况下,您可以基于轴连接或合并连接。
比如说:
pd.concat([df1, df2]) #add axis and join type if necessary
。
以下文档可帮助您了解:merging and concat in pandas