我有10个数据帧,其中包含一个包含10000条记录的标识结构。我想创建一个矩阵,其中包含所有不同数据帧的每1000条记录。
所以我的数据集如下:
df = pd.read_csv('10000_0.csv')
df1 = pd.read_csv('10000_1.csv')
df2 = pd.read_csv('10000_2.csv')
df3 = pd.read_csv('10000_3.csv')
df4 = pd.read_csv('10000_4.csv')
df5 = pd.read_csv('10000_5.csv')
df6 = pd.read_csv('10000_6.csv')
df7 = pd.read_csv('10000_7.csv')
df8 = pd.read_csv('10000_8.csv')
df9 = pd.read_csv('10000_9.csv')
现在我想创建一个数组,其中[]第一个元素是[df['name'][1000], df1['name'][1000], ..., df9['name'][1000]]
的列表,是否有可能在pandas中有效地构造它?
答案 0 :(得分:1)
使用:
files = ['10000_{}.csv'.format(x) for x in range(10)]
#list of all DataFrames
dfs = [pd.read_csv(f) for f in files]
#list of one row DataFrame
L = [x.iloc[[1000]] for x in dfs]
#list of Series
L = [x.iloc[1000] for x in dfs]
#final DataFrame
df1 = pd.concat(L, ignore_index=True)
另一种解决方案,如果只需要一行:
files = ['10000_{}.csv'.format(x) for x in range(10)]
#list of all DataFrames
dfs = [pd.read_csv(f, skiprows=(1, 1000), nrows=1) for f in files]
答案 1 :(得分:0)
您可以使用Pandas tail :
arr = []
fnames = ['10000_0.csv',...]
for fname in fnames:
arr.append(pd.read_csv(fname).tail(1)['name'].values[0])