Python pandas独特的de-duped sum pivot表

时间:2018-04-03 19:50:05

标签: python pandas group-by pivot

使用dataframe df:

<?php
include_once 'dbconnect.php';

//check if form is submitted
if (isset($_POST['submit'])) {

// START OF PRE-EXISTING FILE CHECK
    $filename = $_FILES['file1']['name'];
    $dupeCheck = "SELECT * FROM tbl_files WHERE filename = '$filename'";

    if ($output = mysqli_query($con, $dupeCheck)) {
        if (mysqli_num_rows($output) > 0) {
            $fileArray = pathinfo($filename);
            $timeStamp = "-" . date("H:i:s");
            $filename = $fileArray['filename'] . $timeStamp . "." . $fileArray['extension'];

        }
    }
// END OF PRE-EXISTING FILE CHECK

    if($filename != '')
    {
        $trueCheck = true;

        if ($trueCheck == true) {

            $sql = 'select max(id) as id from tbl_files';
            $result = mysqli_query($con, $sql);

            //set target directory
            $path = 'uploads/';

            $created = @date('Y-m-d H-i-s');

            $moveTargetVar = "uploads/" . $filename;

            move_uploaded_file($_FILES['file1']['tmp_name'], $moveTargetVar);

            // insert file details into database
            $sql = "INSERT INTO tbl_files(filename, created) VALUES('$filename', '$created')";
            mysqli_query($con, $sql);
            header("Location: index.php?st=success");
        }
        else
        {
            header("Location: index.php?st=error");
        }
    }
    else
        header("Location: index.php");
}
?>

我需要按类别和Sum_Row来调整上述数据。但是,我需要使用事务ID进行分组,因此对于上面的事务ID 123,我只计算-1次。

我可以使用pandas数据透视表或仅使用groupby吗?

User_ID | Transaction_ID | Transaction_Row | Category
3824739         123               -1           A
3824739         123               -1           A
2398473         345               0            A
1230984         567               1            C

当前输出:

pd.pivot_table(df,index=["Category"],values=["Transaction_Row"],aggfunc=np.sum)

期望的输出:

Category | Sum of Transaction_Row
   A               -2
   C                1

我不知道如何编辑上述声明以解决重复计算问题。

谢谢!

1 个答案:

答案 0 :(得分:2)

我希望我的问题是正确的。 首先,仅基于Transaction_ID和Transaction_Row删除重复项。然后做转轴。

df_2 = df.drop_duplicates(subset=['Transaction_ID', 'Transaction_Row']) 
pd.pivot_table(df_2, index=["Category"], values=["Transaction_Row"], aggfunc=np.sum)