比较两个数据表对,并根据第二个数据表更正第一个数据表中的值

时间:2018-09-01 21:11:51

标签: r dataframe data.table compare

我们有以下两个数据表。第一个收集有错误的数据,第二个包含正确的x1和x2对:

第一个:

/**
 * Debug Pending Updates
 *
 * Displays hidden plugin and theme updates on update-core screen.
 */
function debug_pending_updates() {

  // Rough safety nets
  if ( ! is_user_logged_in() || ! current_user_can( 'update_plugins' ) || ! current_user_can( 'update_themes' ) ) return;

  $output = "";

  // Check plugins
  $plugin_updates = get_site_transient( 'update_plugins' );
  if ( $plugin_updates && ! empty( $plugin_updates->response ) ) {
    foreach ( $plugin_updates->response as $plugin => $details ) {
      $output .= "<p><strong>Plugin</strong> <u>$plugin</u> is reporting an available update.</p>";
    }
  }

  // Check themes
  wp_update_themes();
  $theme_updates = get_site_transient( 'update_themes' );
  if ( $theme_updates && ! empty( $theme_updates->response ) ) {
    foreach ( $theme_updates->response as $theme => $details ) {
      $output .= "<p><strong>Theme</strong> <u>$theme</u> is reporting an available update.</p>";
    }
  }

  if ( empty( $output ) ) $output = "No pending updates found in the database.";

  echo "<h2>Pending updates</h2>" . $output;
}
add_action( 'core_upgrade_preamble', 'debug_pending_updates' );

第二个:

def chaselect():
    print 'Now you must choose your race '

    while player.race == None:
        ace = raw_input('1. Human \n2. Elf \n3. Dwarf \n4. Orc\n')
        if ace == '1':
            print 'You chose human are you sure?' 
            con = raw_input('\n1. Confirm Race\n2. Read Lore\n3. Go Back')

我想在第一个表中找到不正确的行,并根据第二个表更正它们。因此输出应如下所示:

+----------+-----+------+
|    x1    |  x2 |  x3  |
+----------+-----+------+
| march    |  3  |  198 |
| april    |  4  | 4984 |
| february |  2  |  498 |
| march    |  35 |  984 |
| aripl    |  4  |  498 |
+----------+-----+------+

应该检查x1是否是正确的名称,然后添加正确的数字,或者x2是正确的数字,然后添加正确的名称。

对于第一部分,我猜答案是2。尽管我需要以某种方式使其适合我的情况(因此,对它的任何帮助也将不胜感激)。对于第二部分,我所知道的只是使用“ for”和“ if”,这对于速度问题是不可接受的(甚至是不可能的)。

1 个答案:

答案 0 :(得分:0)

您可以按名称合并以找到正确的编号,然后可以按编号合并以找到正确的名称。

示例数据

library(data.table)

dt1 <- data.table(
    x1 = c("march", "april", "february", "march", "aripl"),
    x2 = c(3,4,2,35,4),
    x3 = c(198,4984,498,984,498)
)

dt2 <- data.table(
    x1 = c("january", "february", "march", "april", "may"),
    x2 = 1:5
)

解决方案:

# fix number by merging via name
result <- merge(dt1, dt2, by="x1", all.x=T)
result[ , corr_num := ifelse(is.na(x2.y), x2.x, x2.y)]
result[ , c("x2.x", "x2.y") := NULL]

# fix name by merging via number
result <- merge(result, dt2, by.x="corr_num", by.y="x2", all.x=T)
result[ , corr_name := x1.y]
result[ , c("x1.x", "x1.y") := NULL]

等效解决方案:

result <- merge(merge(dt1, dt2, by="x1", all.x=T), dt2, by.x="x2.x", by.y="x2", all.x=T)
result[ , corr_num  := ifelse(is.na(x2.y), x2.x, x2.y)]
result[ , corr_name := ifelse(is.na(x1.y), x1.x, x1.y)]
result[ , grep("\\.", names(result)) := NULL]

结果

> result[ , .(corr_name, corr_num, x3)]
   corr_name corr_num   x3
1:  february        2  498
2:     march        3  198
3:     april        4 4984
4:     april        4  498
5:     march        3  984