获取pandas中多个数据帧的每个第n个元素

时间:2018-06-08 11:46:27

标签: python pandas

我有10个数据帧,其中包含一个包含10000条记录的标识结构。我想创建一个矩阵,其中包含所有不同数据帧的每1000条记录。

所以我的数据集如下:

df = pd.read_csv('10000_0.csv')
df1 = pd.read_csv('10000_1.csv')
df2 = pd.read_csv('10000_2.csv')
df3 = pd.read_csv('10000_3.csv')
df4 = pd.read_csv('10000_4.csv')
df5 = pd.read_csv('10000_5.csv')
df6 = pd.read_csv('10000_6.csv')
df7 = pd.read_csv('10000_7.csv')
df8 = pd.read_csv('10000_8.csv')
df9 = pd.read_csv('10000_9.csv')

现在我想创建一个数组,其中[]第一个元素是[df['name'][1000], df1['name'][1000], ..., df9['name'][1000]]的列表,是否有可能在pandas中有效地构造它?

2 个答案:

答案 0 :(得分:1)

使用:

files = ['10000_{}.csv'.format(x) for x in range(10)]

#list of all DataFrames
dfs = [pd.read_csv(f) for f in files]

#list of one row DataFrame 
L = [x.iloc[[1000]] for x in dfs]
#list of Series
L = [x.iloc[1000] for x in dfs]

#final DataFrame
df1 = pd.concat(L, ignore_index=True)

另一种解决方案,如果只需要一行:

files = ['10000_{}.csv'.format(x) for x in range(10)]

#list of all DataFrames
dfs = [pd.read_csv(f, skiprows=(1, 1000), nrows=1) for f in files]

答案 1 :(得分:0)

您可以使用Pandas tail

arr = []

fnames = ['10000_0.csv',...]

for fname in fnames:
    arr.append(pd.read_csv(fname).tail(1)['name'].values[0])