打开文件中的所有CSV(名称以数字结尾),
然后获取专栏' IMO' (在每个选定的文件中)将它们连接成一个' df'数据帧:
import pandas as pd
df = pd.concat([pd.read_csv(path + '/' + f) for f in all_names if f.split('_')[3][:-4].isdigit()]['IMO'])
但是我想在一行中完成它(纯粹的挑战别无其他)!
到目前为止它返回了错误:
IndexError: list index out of range
以下是print(all_names)
的结果:
['AIS_SIGHTINGS_Q1_2009.csv', 'AIS_SIGHTINGS_Q1_2009_corrected.csv', 'AIS_SIGHTINGS_Q1_2009_corrected_short.csv', 'AIS_SIGHTINGS_Q1_2010.csv', 'AIS_SIGHTINGS_Q1_2011.csv', 'AIS_SIGHTINGS_Q1_2012.csv', 'AIS_SIGHTINGS_Q1_2013.csv', 'AIS_SIGHTINGS_Q1_2014.csv', 'AIS_SIGHTINGS_Q2_2009.csv', 'AIS_SIGHTINGS_Q2_2010.csv', 'AIS_SIGHTINGS_Q2_2011.csv', 'AIS_SIGHTINGS_Q2_2012.csv', 'AIS_SIGHTINGS_Q2_2013.csv', 'AIS_SIGHTINGS_Q2_2014.csv', 'AIS_SIGHTINGS_Q3_2009.csv', 'AIS_SIGHTINGS_Q3_2010.csv', 'AIS_SIGHTINGS_Q3_2011.csv', 'AIS_SIGHTINGS_Q3_2012.csv', 'AIS_SIGHTINGS_Q3_2013.csv', 'AIS_SIGHTINGS_Q3_2014.csv', 'AIS_SIGHTINGS_Q4_2009.csv', 'AIS_SIGHTINGS_Q4_2010.csv', 'AIS_SIGHTINGS_Q4_2011.csv', 'AIS_SIGHTINGS_Q4_2012.csv', 'AIS_SIGHTINGS_Q4_2013.csv', 'AIS_SIGHTINGS_Q4_2014.csv', 'a_few_boats_AIS.csv', 'unique_boat_names.csv', 'unique_ports.csv', 'unique_vessel.csv']
答案 0 :(得分:1)
使用pandas过滤错误的文件名和参数usecols
仅过滤列IMO
。 str[3]
中的pandas
未失败,但如果NaN
列表不存在,则返回4.
。
#one line solution
df = pd.concat([pd.read_csv(path + '/' + f, usecols=['IMO']) for f in pd.Series(all_names)[pd.Series(all_names).str.split('_').str[3].str[:-4].str.isdigit().fillna(False)]])
与:
相同s = pd.Series(all_names)
v = s[s.str.split('_').str[3].str[:-4].str.isdigit().fillna(False)]
df = pd.concat([pd.read_csv(path + '/' + f, usecols=['IMO']) for f in v)
验证
all_names = ['AIS_SIGHTINGS_Q1_2009.csv', 'AIS_SIGHTINGS_Q1_2009_corrected.csv', 'AIS_SIGHTINGS_Q1_2009_corrected_short.csv', 'AIS_SIGHTINGS_Q1_2010.csv', 'AIS_SIGHTINGS_Q1_2011.csv', 'AIS_SIGHTINGS_Q1_2012.csv', 'AIS_SIGHTINGS_Q1_2013.csv', 'AIS_SIGHTINGS_Q1_2014.csv', 'AIS_SIGHTINGS_Q2_2009.csv', 'AIS_SIGHTINGS_Q2_2010.csv', 'AIS_SIGHTINGS_Q2_2011.csv', 'AIS_SIGHTINGS_Q2_2012.csv', 'AIS_SIGHTINGS_Q2_2013.csv', 'AIS_SIGHTINGS_Q2_2014.csv', 'AIS_SIGHTINGS_Q3_2009.csv', 'AIS_SIGHTINGS_Q3_2010.csv', 'AIS_SIGHTINGS_Q3_2011.csv', 'AIS_SIGHTINGS_Q3_2012.csv', 'AIS_SIGHTINGS_Q3_2013.csv', 'AIS_SIGHTINGS_Q3_2014.csv', 'AIS_SIGHTINGS_Q4_2009.csv', 'AIS_SIGHTINGS_Q4_2010.csv', 'AIS_SIGHTINGS_Q4_2011.csv', 'AIS_SIGHTINGS_Q4_2012.csv', 'AIS_SIGHTINGS_Q4_2013.csv', 'AIS_SIGHTINGS_Q4_2014.csv', 'a_few_boats_AIS.csv', 'unique_boat_names.csv', 'unique_ports.csv', 'unique_vessel.csv']
s = pd.Series(all_names)
v = s[s.str.split('_').str[3].str[:-4].str.isdigit().fillna(False)]
print (v)
0 AIS_SIGHTINGS_Q1_2009.csv
3 AIS_SIGHTINGS_Q1_2010.csv
4 AIS_SIGHTINGS_Q1_2011.csv
5 AIS_SIGHTINGS_Q1_2012.csv
6 AIS_SIGHTINGS_Q1_2013.csv
7 AIS_SIGHTINGS_Q1_2014.csv
8 AIS_SIGHTINGS_Q2_2009.csv
9 AIS_SIGHTINGS_Q2_2010.csv
10 AIS_SIGHTINGS_Q2_2011.csv
11 AIS_SIGHTINGS_Q2_2012.csv
12 AIS_SIGHTINGS_Q2_2013.csv
13 AIS_SIGHTINGS_Q2_2014.csv
14 AIS_SIGHTINGS_Q3_2009.csv
15 AIS_SIGHTINGS_Q3_2010.csv
16 AIS_SIGHTINGS_Q3_2011.csv
17 AIS_SIGHTINGS_Q3_2012.csv
18 AIS_SIGHTINGS_Q3_2013.csv
19 AIS_SIGHTINGS_Q3_2014.csv
20 AIS_SIGHTINGS_Q4_2009.csv
21 AIS_SIGHTINGS_Q4_2010.csv
22 AIS_SIGHTINGS_Q4_2011.csv
23 AIS_SIGHTINGS_Q4_2012.csv
24 AIS_SIGHTINGS_Q4_2013.csv
25 AIS_SIGHTINGS_Q4_2014.csv
dtype: object
答案 1 :(得分:0)
此代码的最终(工作)版本如下:
df = pd.concat([pd.read_csv(path + '/' + f,usecols=['IMO']) for f in all_names if f.split('.')[0][-1].isdigit()])
我还没有尝试过你的版本,但它看起来应该可以正常工作(如果你修复了括号问题;))。 谢谢你的答案。