Question

我以CSV格式存储了此数据：

first, middle, last, id, fte
Alexander,Frank,Johnson,460700,1 
Ashley,Jane,Smith,470000,.5 
Ashley,Jane,Smith,470000,.25 
Ashley,Jane,Smith,470000,.25 
Steve,Robert,Brown,460001,1

我需要找到具有相同ID号的人行，然后将这些行的FTE合并到同一行中。我还需要为没有重复的行添加0。例如（使用上面的数据）：

first, middle, last, id, fte1, fte2, fte3, fte4
Alexander,Frank,Johnson,460700,1,0,0,0
Ashley,Jane,Smith,470000,.5,.25,.25,0
Steve,Robert,Brown,460001,1,0,0,0

基本上，我们正在研究人们担任的工作。有些人每周工作40个小时（1.0 FTE），有些人每周工作20个小时两次（0.5和0.5 FTE），有些人可能每周工作10个小时4个（.25，.25， .25和.25 FTE），有些可能具有其他组合。每个员工只获得一行数据，因此我们需要在同一行上安装FTE。

这是我们到目前为止所拥有的。目前，我们当前的代码仅在具有两个FTE时才有效。如果它们有3个或4个，它只会用后两个覆盖它们（因此，如果有3个，它给我们2和3。如果它们有4个，它给我们3和4）。

f = open('data.csv')
csv_f = csv.reader(f)
dataset = []
for row in csv_f:
    dictionary = {}
    dictionary["first"] = row[2]
    dictionary["middle"] = row[3]
    dictionary["last"] = row[4]
    dictionary["id"] = row[10]
    dictionary["fte"] = row[12]
    dataset.append(dictionary)

def is_match(dict1, dict2):
    return (dict1["id"] == dict2["id"])

def find_match(dictionary, dict_list):
    for index in range(0, len(dict_list)):
        if is_match(dictionary, dict_list[index]):
            return index
    return -1

def process_data(dataset):
    result = []
    for index in range(1, len(dataset)):
        data_dict = dataset[index]
        match_index = find_match(data_dict, result)
        id = str(data_dict["id"])
        if match_index == -1:
            result.append(data_dict)
        else:
            (result[match_index])["fte2"] = data_dict["fte"]
    return result

f.close()

for row in process_data(dataset):
    print(row)

任何帮助将不胜感激！谢谢！

Answer 1

我会说要使用pandas库来简化它。您可以将group by与聚合一起使用。以下是此处https://www.tutorialspoint.com/python_pandas/python_pandas_groupby.htm提供的汇总示例之后的示例。

import pandas as pd
import numpy as np

df = pd.read_csv('filename.csv')

grouped = df.groupby('id')
print grouped['fte'].agg(np.sum)

将行中的值追加到新列中？

1 个答案: