我在文件夹中存储了一些csv文件。我想阅读其中的每一个,并将特定列汇总到一个nem数据框中。 它们都具有相同的索引范围和相同的列名。 这是我到目前为止所拥有的:
import pandas as pd
import glob
path = r'C:\Users\lsminervino\Desktop\MUN'
files = glob.glob(path + "/*.csv")
df2= pd.DataFrame(index=range(646))
for file in files:
df = pd.read_csv(file, encoding="latin", sep=';')
# new data frame with split value columns
new = df["Unnamed: 0"].str.split("-", n = 1, expand = True)
# making separate first name column from new data frame
df["IBGE"]= new[0]
# making separate last name column from new data frame
df["Cidade"]= new[1]
# Dropping old Name columns
df.drop(columns =["Unnamed: 0"], inplace = True)
df = df.set_index('Cidade')
df2 = df['Total']
df2.head()
Out:
Cidade
Adamantina 0
Adolfo 0
Aguaí 0
Águas da Prata 0
Águas de Lindóia 0
Name: Total, dtype: int64
我期望的是新数据框中文件夹中每个文件的名称为“总计”的每一列的总和(我无法正确编写代码)。
以下是.csv文件之一的示例:
Unnamed: 0 Total Cadastro Sem Registro Civil
0 3500105 - Adamantina 17.0 17.0 0
1 3500204 - Adolfo 3.0 3.0 0
2 3500303 - Aguaí 14.0 14.0 0
3 3500402 - Águas da Prata 2.0 2.0 0
4 3500501 - Águas de Lindóia 0.0 0.0 0
答案 0 :(得分:0)
尝试concat
和groupby
。这对您有用吗?
import pandas as pd
import glob
path = r'C:\Users\lsminervino\Desktop\MUN'
files = glob.glob(path + "/*.csv")
total_df = []
for file in files:
df = pd.read_csv(file, encoding="latin", sep=';')
# new data frame with split value columns
new = df["Unnamed: 0"].str.split("-", n = 1, expand = True)
# making separate first name column from new data frame
df["IBGE"]= new[0]
# making separate last name column from new data frame
df["Cidade"]= new[1]
# Dropping old Name columns
df.drop(columns =["Unnamed: 0"], inplace = True)
total_df.append(df['Total'])
df_final = pd.concat(total_df).groupby(by='Cidade').sum()