Concat将许多文件中的所有列合并为一个(在一行中!)

时间:2018-01-26 12:09:54

标签: python windows pandas

打开文件中的所有CSV(名称以数字结尾),

然后获取专栏' IMO' (在每个选定的文件中)将它们连接成一个' df'数据帧:

import pandas as pd

df = pd.concat([pd.read_csv(path + '/' + f) for f in all_names if f.split('_')[3][:-4].isdigit()]['IMO'])

但是我想在一行中完成它(纯粹的挑战别无其他)!

到目前为止它返回了错误:

IndexError: list index out of range

以下是print(all_names)的结果:

['AIS_SIGHTINGS_Q1_2009.csv', 'AIS_SIGHTINGS_Q1_2009_corrected.csv', 'AIS_SIGHTINGS_Q1_2009_corrected_short.csv', 'AIS_SIGHTINGS_Q1_2010.csv', 'AIS_SIGHTINGS_Q1_2011.csv', 'AIS_SIGHTINGS_Q1_2012.csv', 'AIS_SIGHTINGS_Q1_2013.csv', 'AIS_SIGHTINGS_Q1_2014.csv', 'AIS_SIGHTINGS_Q2_2009.csv', 'AIS_SIGHTINGS_Q2_2010.csv', 'AIS_SIGHTINGS_Q2_2011.csv', 'AIS_SIGHTINGS_Q2_2012.csv', 'AIS_SIGHTINGS_Q2_2013.csv', 'AIS_SIGHTINGS_Q2_2014.csv', 'AIS_SIGHTINGS_Q3_2009.csv', 'AIS_SIGHTINGS_Q3_2010.csv', 'AIS_SIGHTINGS_Q3_2011.csv', 'AIS_SIGHTINGS_Q3_2012.csv', 'AIS_SIGHTINGS_Q3_2013.csv', 'AIS_SIGHTINGS_Q3_2014.csv', 'AIS_SIGHTINGS_Q4_2009.csv', 'AIS_SIGHTINGS_Q4_2010.csv', 'AIS_SIGHTINGS_Q4_2011.csv', 'AIS_SIGHTINGS_Q4_2012.csv', 'AIS_SIGHTINGS_Q4_2013.csv', 'AIS_SIGHTINGS_Q4_2014.csv', 'a_few_boats_AIS.csv', 'unique_boat_names.csv', 'unique_ports.csv', 'unique_vessel.csv']

2 个答案:

答案 0 :(得分:1)

使用pandas过滤错误的文件名和参数usecols仅过滤列IMOstr[3]中的pandas未失败,但如果NaN列表不存在,则返回4.

#one line solution
df = pd.concat([pd.read_csv(path + '/' + f, usecols=['IMO']) for f in pd.Series(all_names)[pd.Series(all_names).str.split('_').str[3].str[:-4].str.isdigit().fillna(False)]])

与:

相同
s = pd.Series(all_names)
v = s[s.str.split('_').str[3].str[:-4].str.isdigit().fillna(False)]
df = pd.concat([pd.read_csv(path + '/' + f, usecols=['IMO']) for f in v)

验证

all_names = ['AIS_SIGHTINGS_Q1_2009.csv', 'AIS_SIGHTINGS_Q1_2009_corrected.csv', 'AIS_SIGHTINGS_Q1_2009_corrected_short.csv', 'AIS_SIGHTINGS_Q1_2010.csv', 'AIS_SIGHTINGS_Q1_2011.csv', 'AIS_SIGHTINGS_Q1_2012.csv', 'AIS_SIGHTINGS_Q1_2013.csv', 'AIS_SIGHTINGS_Q1_2014.csv', 'AIS_SIGHTINGS_Q2_2009.csv', 'AIS_SIGHTINGS_Q2_2010.csv', 'AIS_SIGHTINGS_Q2_2011.csv', 'AIS_SIGHTINGS_Q2_2012.csv', 'AIS_SIGHTINGS_Q2_2013.csv', 'AIS_SIGHTINGS_Q2_2014.csv', 'AIS_SIGHTINGS_Q3_2009.csv', 'AIS_SIGHTINGS_Q3_2010.csv', 'AIS_SIGHTINGS_Q3_2011.csv', 'AIS_SIGHTINGS_Q3_2012.csv', 'AIS_SIGHTINGS_Q3_2013.csv', 'AIS_SIGHTINGS_Q3_2014.csv', 'AIS_SIGHTINGS_Q4_2009.csv', 'AIS_SIGHTINGS_Q4_2010.csv', 'AIS_SIGHTINGS_Q4_2011.csv', 'AIS_SIGHTINGS_Q4_2012.csv', 'AIS_SIGHTINGS_Q4_2013.csv', 'AIS_SIGHTINGS_Q4_2014.csv', 'a_few_boats_AIS.csv', 'unique_boat_names.csv', 'unique_ports.csv', 'unique_vessel.csv']

s = pd.Series(all_names)
v = s[s.str.split('_').str[3].str[:-4].str.isdigit().fillna(False)]
print (v)

0     AIS_SIGHTINGS_Q1_2009.csv
3     AIS_SIGHTINGS_Q1_2010.csv
4     AIS_SIGHTINGS_Q1_2011.csv
5     AIS_SIGHTINGS_Q1_2012.csv
6     AIS_SIGHTINGS_Q1_2013.csv
7     AIS_SIGHTINGS_Q1_2014.csv
8     AIS_SIGHTINGS_Q2_2009.csv
9     AIS_SIGHTINGS_Q2_2010.csv
10    AIS_SIGHTINGS_Q2_2011.csv
11    AIS_SIGHTINGS_Q2_2012.csv
12    AIS_SIGHTINGS_Q2_2013.csv
13    AIS_SIGHTINGS_Q2_2014.csv
14    AIS_SIGHTINGS_Q3_2009.csv
15    AIS_SIGHTINGS_Q3_2010.csv
16    AIS_SIGHTINGS_Q3_2011.csv
17    AIS_SIGHTINGS_Q3_2012.csv
18    AIS_SIGHTINGS_Q3_2013.csv
19    AIS_SIGHTINGS_Q3_2014.csv
20    AIS_SIGHTINGS_Q4_2009.csv
21    AIS_SIGHTINGS_Q4_2010.csv
22    AIS_SIGHTINGS_Q4_2011.csv
23    AIS_SIGHTINGS_Q4_2012.csv
24    AIS_SIGHTINGS_Q4_2013.csv
25    AIS_SIGHTINGS_Q4_2014.csv
dtype: object

答案 1 :(得分:0)

此代码的最终(工作)版本如下:

df = pd.concat([pd.read_csv(path + '/' + f,usecols=['IMO']) for f in all_names if f.split('.')[0][-1].isdigit()])

我还没有尝试过你的版本,但它看起来应该可以正常工作(如果你修复了括号问题;))。 谢谢你的答案。