使用dataframe df:
<?php
include_once 'dbconnect.php';
//check if form is submitted
if (isset($_POST['submit'])) {
// START OF PRE-EXISTING FILE CHECK
$filename = $_FILES['file1']['name'];
$dupeCheck = "SELECT * FROM tbl_files WHERE filename = '$filename'";
if ($output = mysqli_query($con, $dupeCheck)) {
if (mysqli_num_rows($output) > 0) {
$fileArray = pathinfo($filename);
$timeStamp = "-" . date("H:i:s");
$filename = $fileArray['filename'] . $timeStamp . "." . $fileArray['extension'];
}
}
// END OF PRE-EXISTING FILE CHECK
if($filename != '')
{
$trueCheck = true;
if ($trueCheck == true) {
$sql = 'select max(id) as id from tbl_files';
$result = mysqli_query($con, $sql);
//set target directory
$path = 'uploads/';
$created = @date('Y-m-d H-i-s');
$moveTargetVar = "uploads/" . $filename;
move_uploaded_file($_FILES['file1']['tmp_name'], $moveTargetVar);
// insert file details into database
$sql = "INSERT INTO tbl_files(filename, created) VALUES('$filename', '$created')";
mysqli_query($con, $sql);
header("Location: index.php?st=success");
}
else
{
header("Location: index.php?st=error");
}
}
else
header("Location: index.php");
}
?>
我需要按类别和Sum_Row来调整上述数据。但是,我需要使用事务ID进行分组,因此对于上面的事务ID 123,我只计算-1次。
我可以使用pandas数据透视表或仅使用groupby吗?
User_ID | Transaction_ID | Transaction_Row | Category
3824739 123 -1 A
3824739 123 -1 A
2398473 345 0 A
1230984 567 1 C
当前输出:
pd.pivot_table(df,index=["Category"],values=["Transaction_Row"],aggfunc=np.sum)
期望的输出:
Category | Sum of Transaction_Row
A -2
C 1
我不知道如何编辑上述声明以解决重复计算问题。
谢谢!
答案 0 :(得分:2)
我希望我的问题是正确的。 首先,仅基于Transaction_ID和Transaction_Row删除重复项。然后做转轴。
df_2 = df.drop_duplicates(subset=['Transaction_ID', 'Transaction_Row'])
pd.pivot_table(df_2, index=["Category"], values=["Transaction_Row"], aggfunc=np.sum)