我正在尝试创建一个函数,该函数将接收 CSV 文件并创建数据帧并像这样连接/求和:
id number_of_visits
0 3902932804358904910 2
1 5972629290368575970 1
2 5345473950081783242 1
3 4289865755939302179 1
4 36619425050724793929 19
+
id number_of_visits
0 3902932804358904910 5
1 5972629290368575970 10
2 5345473950081783242 3
3 4289865755939302179 20
4 36619425050724793929 13
=
id number_of_visits
0 3902932804358904910 7
1 5972629290368575970 11
2 5345473950081783242 4
3 4289865755939302179 21
4 36619425050724793929 32
我的主要问题是在创建数据帧后的 for 循环中,我尝试通过 df += new_df
进行连接,但未添加 new_df
。所以我尝试了以下实现。
def add_dfs(files):
master = []
big = pd.DataFrame({'id': 0, 'number_of_visits': 0}, index=[0]) # dummy df to initialize
for k in range(len(files)):
new_df = create_df(str(files[k])) # helper method to read, create and clean dfs
master.append(new_df) #creates a list of dataframes with in master
for k in range(len(master)):
big = pd.concat([big, master[k]]).groupby(['id', 'number_of_visits']).sum().reset_index()
# iterate through list of dfs and add them together
return big
这给了我以下内容
id number_of_visits
1 1000036822946495682 2
2 1000036822946495682 4
3 1000044447054156512 1
4 1000044447054156512 9
5 1000131582129684623 1
因此每个 number_of_visits
的 user_id
实际上并没有加在一起,它们只是按 number_of_visits
的顺序排序
答案 0 :(得分:1)
将您的数据框列表直接传递给 concat()
,然后对 id
进行分组并求和。
>>> pd.concat(master).groupby('id').number_of_visits.sum().reset_index()
id number_of_visits
0 36619425050724793929 32
1 3902932804358904910 7
2 4289865755939302179 21
3 5345473950081783242 4
4 5972629290368575970 11
def add_dfs(files):
master = []
for f in files:
new_df = create_df(f)
master.append(new_df)
big = pd.concat(master).groupby('id').number_of_visits.sum().reset_index()
return big
答案 1 :(得分:0)
你可以使用
//method to return int
function myinteger() {
const a = 0;
return a;
}
//method to return string
function mystring() {
const b = "myreturn";
return b;
}
console.log(myinteger());
console.log(mystring());
这给你:
df1['number_of_visits'] += df2['number_of_visits']