Python - For循环,它实现了一个sum并将答案存储在新列中

时间:2016-05-15 10:55:10

标签: python python-3.x csv for-loop

我想计算所有具有相同名字的人的总年龄:请参阅此处的示例表。

table with names

这是我到目前为止编写的代码..但它不完整而且它不起作用..

final_df = DataFrame()  

for i in [list of names]:
dummy = sort_df.loc[sort_df['name'] == i]    
total_age = 0   

for j in dummy.age:       
    age2 = dummy.age(j)

    total_age = total_age + age2

    final_df.append(total_age)


final_df['total_age'] = total_age  

如何解决此问题,我可以编写一个代码,该代码将遍历具有相同名称的年龄段的人并将它们相加并将它们存储在新列中?

最后它应该是这样的:

Result

3 个答案:

答案 0 :(得分:2)

查看您的代码我假设有一个名为input.csv的csv文件已经被sort_df读入,其中包含以下数据:

name,age,total age
Alfredo,13,
Alfredo,12,
Alfredo,15,
Jaap,12,
Jaap,14,
Koen,16,
Lian,76,
Lian,45,
Lian,34,
Lian,14,

在这种情况下,无需声明另一个dummy数据帧。使用此:

from pandas import DataFrame

sort_df = DataFrame.from_csv("inCSV.txt", index_col=False)
final_df = sort_df

# Use a dictionary to keep track instead
total_age = {}
for name in sort_df["name"]:
    if name not in total_age.keys():
        total_age[name] = 0

# Add up the ages
for index in xrange(len(sort_df)):
    person = sort_df.loc[index]
    name = person["name"]
    age = person["age"]
    total_age[name] += age

# Set the new ages into final_df
for index in xrange(len(final_df)):
    person = final_df.loc[index]
    name = person["name"]
    final_df.set_value(index, "total age", total_age[name])

print final_df

会给你(在final_df中):

      name  age  total age
0  Alfredo   13       40.0
1  Alfredo   12       40.0
2  Alfredo   15       40.0
3     Jaap   12       26.0
4     Jaap   14       26.0
5     Koen   16       16.0
6     Lian   76      169.0
7     Lian   45      169.0
8     Lian   34      169.0
9     Lian   14      169.0

答案 1 :(得分:1)

例如数据,

name,age
Alfredo,13,
Alfredo,12,
Alfredo,15,
Jaap,12,
Jaap,14,
Koen,16,
Lian,76,
Lian,45,
Lian,34,
Lian,14,

import csv
from collections import defaultdict
result = defaultdict(int)
reader = csv.DictReader(csv_file_handle)
for person in reader:
    name = person['name'].lower()
    age = int(person['age'])
    result[name] += age
>>> result
defaultdict(int, {'alfredo': 40, 'jaap': 26, 'koen': 16, 'lian': 169})

使用result

更新文件
# make a reader object    
# make a writer object with fieldnames ['name', 'age', 'total_age']
# write header
for person in reader:
    person.update({'total_age': result[person['name'].lower()]})
    writer.writerow(person)

答案 2 :(得分:0)

你在这里找到答案:

from collections import defaultdict
list_name = [
{'age': 25, 'name': 'alfredo'},
{'age': 44, 'name': 'alfredo'},
{'age': 23, 'name': 'jaap'},
{'age': 60, 'name': 'jaap'}
]

k={}
c = defaultdict(int)
for d in list_name:
     c[d['name']] += d['age']
     k[d['name']] = c[d['name']]
for d in list_name:
    d['total_age'] = k[d['name']] 
print list_name