Question

我正在尝试创建一个函数，该函数将接收 CSV 文件并创建数据帧并像这样连接/求和：

    id     number_of_visits
0   3902932804358904910  2
1   5972629290368575970  1
2   5345473950081783242  1
3   4289865755939302179  1
4   36619425050724793929 19

+ 

    id     number_of_visits
0   3902932804358904910  5
1   5972629290368575970  10
2   5345473950081783242  3
3   4289865755939302179  20
4   36619425050724793929 13

=

    id     number_of_visits
0   3902932804358904910  7
1   5972629290368575970  11
2   5345473950081783242  4
3   4289865755939302179  21
4   36619425050724793929 32

我的主要问题是在创建数据帧后的 for 循环中，我尝试通过 df += new_df 进行连接，但未添加 new_df。所以我尝试了以下实现。

def add_dfs(files):
    master = []
    big = pd.DataFrame({'id': 0, 'number_of_visits': 0}, index=[0]) # dummy df to initialize
    for k in range(len(files)):
        new_df = create_df(str(files[k])) # helper method to read, create and clean dfs
        master.append(new_df) #creates a list of dataframes with in master
    for k in range(len(master)):
        big = pd.concat([big, master[k]]).groupby(['id', 'number_of_visits']).sum().reset_index()
        # iterate through list of dfs and add them together
    return big

这给了我以下内容

    id   number_of_visits
1   1000036822946495682 2
2   1000036822946495682 4
3   1000044447054156512 1
4   1000044447054156512 9
5   1000131582129684623 1

因此每个 number_of_visits 的 user_id 实际上并没有加在一起，它们只是按 number_of_visits 的顺序排序

Answer 1

将您的数据框列表直接传递给 concat()，然后对 id 进行分组并求和。

>>> pd.concat(master).groupby('id').number_of_visits.sum().reset_index()
                     id  number_of_visits
0  36619425050724793929                32
1   3902932804358904910                 7
2   4289865755939302179                21
3   5345473950081783242                 4
4   5972629290368575970                11

def add_dfs(files):
    master = []
    for f in files:
        new_df = create_df(f) 
        master.append(new_df) 
    big = pd.concat(master).groupby('id').number_of_visits.sum().reset_index()
        
    return big

Answer 2

你可以使用

//method to return int
function myinteger() {
  const a = 0;
  return a;
}
//method to return string
function mystring() {
  const b = "myreturn";
  return b;
}

console.log(myinteger());
console.log(mystring());

这给你：

df1['number_of_visits'] += df2['number_of_visits']

迭代时连接和求和列值

2 个答案: