<class 'pandas.core.frame.DataFrame'>
Int64Index: 19398698 entries, 0 to 429364
Data columns (total 5 columns):
0 object
1 float64
2 object
date object
name object
dtypes: float64(1), object(4)
memory usage: 888.0+ MB
len(df)= 19398698
但是真正的长度是429364,我不知道19398698的长度是从哪里来的,为什么生产的,或者如何解决的(如果将来会产生问题)
edit:数据是通过for循环和concat创建的。
for folder in os.listdir(folder_path):
for file in os.listdir(f'{folder_path}/{folder}'):
os.chdir(f"{folder_path}/{folder}')
if file == 'AMAT.txt':
df = pd.read_csv(f'{file}', header=None, sep=' ')
df['date'] = os.getcwd().split('/')[5]
df['name'] = f'{file}'
all_files = pd.concat([all_files, df])
print(f'{folder}_{file}')
os.chdir("/content")
答案 0 :(得分:2)
您显然有19398698个条目,但是只有429365个唯一索引值或索引值不按顺序排列。参见以下示例:
x
0 1
2 2
1 3
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 1
然后做
df1 = df.sort_index()
df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2