R:按列组聚合数据 - 使用每个观察值的mutate列

时间:2016-10-23 22:17:53

标签: r aggregate

我有一个初学者的问题,聚合一类数据的数据,创建一个新列,每个类别的数据总和为每个遵守。

我想要以下数据:

PIN Balance Total
221 5000 8000
221 2000 8000
221 1000 8000
554 4000 8500
554 4500 8500
643 6000 10000
643 4000 10000

看起来像:

var form = document.querySelector('form')


function checkStatus(response) {
  if (response.status >= 200 && response.status < 300) {
    return response
  } else {
    var error = new Error(response.statusText)
    error.response = response
    throw error
  }
}

function parseJSON(response) {
  return response.json()
}

    fetch('/users',{
       method: 'POST',
       body: new FormData(form)
       })
      .then(checkStatus)
      .then(parseJSON)
      .then(function(data) {
        console.log('request succeeded with JSON response', data)
      }).catch(function(error) {
        console.log('request failed', error)
      })

我尝试过使用aggregate:output&lt; - aggregate(df $ Balance~df $ PIN,data = df,sum)但是无法将数据作为阻塞数量返回到原始数据集中关了。

2 个答案:

答案 0 :(得分:2)

您可以使用dplyr做您想做的事。我们首先group_by PIN然后使用Total创建一个新列mutateBalance是已分组library(dplyr) res <- df %>% group_by(PIN) %>% mutate(Total=sum(Balance)) 的总和:

df

将您的数据用作数据框df <- structure(list(PIN = c(221L, 221L, 221L, 554L, 554L, 643L, 643L ), Balance = c(5000L, 2000L, 1000L, 4000L, 4500L, 6000L, 4000L )), .Names = c("PIN", "Balance"), class = "data.frame", row.names = c(NA, -7L)) ## PIN Balance ##1 221 5000 ##2 221 2000 ##3 221 1000 ##4 554 4000 ##5 554 4500 ##6 643 6000 ##7 643 4000

print(res)
##Source: local data frame [7 x 3]
##Groups: PIN [3]
##
##    PIN Balance Total
##  <int>   <int> <int>
##1   221    5000  8000
##2   221    2000  8000
##3   221    1000  8000
##4   554    4000  8500
##5   554    4500  8500
##6   643    6000 10000
##7   643    4000 10000

我们得到了预期的结果:

data.table

或者我们可以使用library(data.table) setDT(df)[,Table:=sum(Balance),by=PIN][] ## PIN Balance Total ##1: 221 5000 8000 ##2: 221 2000 8000 ##3: 221 1000 8000 ##4: 554 4000 8500 ##5: 554 4500 8500 ##6: 643 6000 10000 ##7: 643 4000 10000

<?php 
//First of all, index your array by personal id, an save space;

$list_person = array (
 '1' => array(
       'short_information' => 'string',
 ),
 '2' => array(
     'short_information' => 'string',
 ),
 '3' => array(
     'short_information' => 'string',
 ),
);

// Separate the persons from books
$list_books = array(
  '1' => array (
     'name' => 'Book 1',
     'borrowed_by_users' => array(1,3),
  ),
  '2' => array (
     'name' => 'Book 2',
     'borrowed_by_users' => array(3),
  ), 
  '3' => array (
     'name' => 'Book 3',
     'borrowed_by_users' => array(1),
  )
);

//search in list of books and return the users and their all data
function search_in_books($name,$list_books,$list_person) {

  //store all personuals id
  $found_persons = array();

  //if find the book, stop search;
  foreach ($list_books as $id_book => $book) {

  // NOTE: you can use a regular expresion if you want core complex searching
  // if (preg_match("/^{$name}*/",$book['name'])) {

   if ($name == $book['name']) {

    // if you would use a regular expresion, you would use:
    // $found_persons = array_merge($found_persons,$book['borrowed_by_users']);

    $found_persons = $book['borrowed_by_users'];

    break;

   }

  };

  $person_details = array();

  //loop thr found person cand build the details array
  foreach ($found_persons as $person_id) 
  {
      $person_details[] = $list_person[$person_id];
  }

  return $person_details;
}

echo '<pre>';
print_r(search_in_books('Book',$list_books,$list_person));

答案 1 :(得分:2)

考虑使用sapply()条件求和方法的基本R解决方案:

df <- read.table(text="PIN Balance
                 221 5000
                 221 2000
                 221 1000
                 554 4000
                 554 4500
                 643 6000
                 643 4000", header=TRUE)    

df$Total <- sapply(seq(nrow(df)), function(i){
  sum(df[df$PIN == df$PIN[i], c("Balance")])
}) 

#   PIN Balance Total
# 1 221    5000  8000
# 2 221    2000  8000
# 3 221    1000  8000
# 4 554    4000  8500
# 5 554    4500  8500
# 6 643    6000 10000
# 7 643    4000 10000