Question

我在文件夹中存储了一些csv文件。我想阅读其中的每一个，并将特定列汇总到一个nem数据框中。它们都具有相同的索引范围和相同的列名。这是我到目前为止所拥有的：

import pandas as pd
import glob
path = r'C:\Users\lsminervino\Desktop\MUN'
files = glob.glob(path + "/*.csv")
df2= pd.DataFrame(index=range(646))
for file in files:    
    df = pd.read_csv(file, encoding="latin", sep=';')


    # new data frame with split value columns 
    new = df["Unnamed: 0"].str.split("-", n = 1, expand = True)


    # making separate first name column from new data frame 
    df["IBGE"]= new[0] 

    # making separate last name column from new data frame 
    df["Cidade"]= new[1]

    # Dropping old Name columns 
    df.drop(columns =["Unnamed: 0"], inplace = True) 

    df = df.set_index('Cidade')

    df2 = df['Total']

df2.head()

Out:
Cidade
 Adamantina          0
 Adolfo              0
 Aguaí               0
 Águas da Prata      0
 Águas de Lindóia    0
Name: Total, dtype: int64

我期望的是新数据框中文件夹中每个文件的名称为“总计”的每一列的总和（我无法正确编写代码）。

以下是.csv文件之一的示例：

                  Unnamed: 0  Total  Cadastro  Sem Registro Civil
0        3500105 - Adamantina   17.0      17.0                   0
1            3500204 - Adolfo    3.0       3.0                   0
2             3500303 - Aguaí   14.0      14.0                   0
3    3500402 - Águas da Prata    2.0       2.0                   0
4  3500501 - Águas de Lindóia    0.0       0.0                   0

Answer 1

尝试concat和groupby。这对您有用吗？

import pandas as pd
import glob
path = r'C:\Users\lsminervino\Desktop\MUN'
files = glob.glob(path + "/*.csv")
total_df = []
for file in files:    
    df = pd.read_csv(file, encoding="latin", sep=';')


    # new data frame with split value columns 
    new = df["Unnamed: 0"].str.split("-", n = 1, expand = True)


    # making separate first name column from new data frame 
    df["IBGE"]= new[0] 

    # making separate last name column from new data frame 
    df["Cidade"]= new[1]

    # Dropping old Name columns 
    df.drop(columns =["Unnamed: 0"], inplace = True) 

    total_df.append(df['Total'])

df_final = pd.concat(total_df).groupby(by='Cidade').sum()

如何在for循环中从多个数据框中添加一列？

1 个答案: