R加速

时间:2018-12-11 04:17:58

标签: r performance apply text-mining

我在循环中有以下脚本:

<button (click)="openTab()">btn1</button>
<p-sidebar class="menuPanel" [(visible)]="opened" position="right" [showCloseIcon]="true" autoZIndex="true" baseZIndex="99999">
Sidebar1
<button (click)="openTab2()">btn2</button>
    <p-sidebar class="menuPanel" [(visible)]="filterStatus" position="top" [showCloseIcon]="true" autoZIndex="false" baseZIndex="9999999">
       sidebar2
    </p-sidebar>
</p-sidebar>

问题是它大大降低了循环速度。

数据如下:

distinct_similar_addresses:

number_of_rows_similar_addresses <- as.data.table(cbind(
    distinct_similar_addresses,
    sapply(distinct_similar_addresses, function(x) {
        length(similar_addresses[Original_Address == x]$people_names) / length(unique(similar_addresses[Original_Address == x]$people_names))
    })
))

similar_addresses:

"U 2 5 TIMPERLEY ST NICHOLLS VIC"       
"U 1 3 TIMPERLEY ST NICHOLLS VIC"                            
"U 1 11 TIMPERLEY ST NICHOLLS VIC"                            
"U 1 33 TIMPERLEY ST NICHOLLS VIC"                           
"U 1 2 TIMPERLEY ST NICHOLLS VIC"                            
"U 1 3 TIMPERLEY ST NICHOLLS VIC"                            
"U 1 5 TIMPERLEY ST NICHOLLS VIC" 

该脚本正在评估该地址是指一个单位还是一所房屋。 有什么方法可以更快地执行此任务?

我正在添加结果集和解释,以使它的内容变得更加容易理解。

结果集:

    people_names,Original_Address,Numbers,street_Name,street_type,post_code,suburb,PO,UID
Giuseppe Conte,U 1 3 TIMPERLEY ST NICHOLLS VIC,1,TIMPERLEY,ST,5469,NICHOLLS,,
Giuseppe Conte,U 1 3 TIMPERLEY ST NICHOLLS VIC,TIMPERLEY,ST,5469,NICHOLLS,,
Mario Pertini,U 2 5 TIMPERLEY ST NICHOLLS VIC,TIMPERLEY,ST,5469,NICHOLLS,,
Mario Pertini,U 2 5 TIMPERLEY ST NICHOLLS VIC,5,TIMPERLEY,ST,5469,NICHOLLS,,

该代码只是计算与单行地址关联的名称的数量。 确实,如果重复地址,则表示它是指一个单元,否则就是一所房子。

2 个答案:

答案 0 :(得分:0)

对于给数据带来不便的人表示抱歉,并感谢Roland的帮助。

这是解决方案

  x <- similar_addresses[, .N, by = Original_Address] %>% select('N')
  y <- similar_addresses[, length(unique(people_names)) , by = Original_Address] %>% select('V1')
  number_of_rows_similar_addresses <- cbind(unique(similar_addresses$Original_Address), x/y)

答案 1 :(得分:0)

感谢Gregor, 这可能更好:

 x <- similar_addresses[, .N, by = Original_Address]$N
 y <- similar_addresses[, length(unique(people_names)) , by = Original_Address]$V1
 number_of_rows_similar_addresses <- cbind(unique(similar_addresses$Original_Address), x/y)