计算向量/行与数据帧匹配的次数

时间:2016-10-11 14:22:38

标签: r dataframe

我有一个大数据框,带有“正”(1)或“负”(0)数据点。

数据示例

my_data <- data.frame(cell = 1:4, marker_a = c(1, 0, 0, 0), 
  marker_b = c(0,1,1,1), marker_c = c(0,1,1,0), marker_d = c(0,1,0,1))


  cell marker_a marker_b marker_c marker_d
1    1        1        0        0        0
2    2        0        1        1        1
3    3        0        1        1        0
4    4        0        1        0        1
...

我有一个不同的data.frame,其中包含my_data$cell可以拥有的正面和负面标记的所有可能组合

combinations_df <- expand.grid(
    marker_a = c(0, 1), 
    marker_b = c(0, 1), 
    marker_c = c(0, 1), 
    marker_d = c(0, 1)
)

   marker_a marker_b marker_c marker_d
1         0        0        0        0
2         1        0        0        0
3         0        1        0        0
4         1        1        0        0
5         0        0        1        0
6         1        0        1        0
7         0        1        1        0
8         1        1        1        0
9         0        0        0        1
10        1        0        0        1
11        0        1        0        1
12        1        1        0        1
13        0        0        1        1
14        1        0        1        1
15        0        1        1        1
16        1        1        1        1

如何获得data.frame每行/组合与my_data的每一行匹配,并返回每个组合的最终计数

预期产出的例子:

      1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16
1 14969 15223 15300 14779 14844 16049 15374 15648 15045 15517 15116 15405 14990 15347 14432 15569

3 个答案:

答案 0 :(得分:1)

我猜测data.table方式非常有效:

library(data.table)
setDT(my_data)

my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ]


    marker_a marker_b marker_c marker_d N
 1:        0        0        0        0 0
 2:        1        0        0        0 1
 3:        0        1        0        0 0
 4:        1        1        0        0 0
 5:        0        0        1        0 0
 6:        1        0        1        0 0
 7:        0        1        1        0 1
 8:        1        1        1        0 0
 9:        0        0        0        1 0
10:        1        0        0        1 0
11:        0        1        0        1 1
12:        1        1        0        1 0
13:        0        0        1        1 0
14:        1        0        1        1 0
15:        0        1        1        1 1
16:        1        1        1        1 0

如果您只关心数据中显示的组合,&#34;链&#34;过滤命令:

my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ][ N > 0 ]


   marker_a marker_b marker_c marker_d N
1:        1        0        0        0 1
2:        0        1        1        0 1
3:        0        1        0        1 1
4:        0        1        1        1 1

或者,在这种情况下,您甚至不需要combinations_df ...

my_data[, .N, by = marker_a:marker_d ]


   marker_a marker_b marker_c marker_d N
1:        1        0        0        0 1
2:        0        1        1        1 1
3:        0        1        1        0 1
4:        0        1        0        1 1

答案 1 :(得分:1)

你正在用“二进制”编写你的组合,所以不需要任何连接,只需要很少的数学。试试这个:

def get_info():

    with open('/etc/network/interfaces', 'r+') as f:
        for line in f:
            found_address = line.find('address')
            if found_address != -1:
                address = line[found_address+len('address:'):]
                print 'Address: ', address
            found_network = line.find('network')
            if found_network != -1:
               network = line[found_network+len('network:'):]
               print 'Network: ', network
            found_netmask = line.find('netmask')
            if found_netmask != -1:
               netmask = line[found_netmask+len('netmask:'):]
               print 'Netmask: ', netmask 
            found_broadcast = line.find('broadcast')
            if found_broadcast != -1:
               broadcast = line[found_broadcast+len('broadcast:'):]
               print 'Broadcast: ', broadcast
    return address 
print get_info()

@app.route('/test')
def showPage():
    addresses = get_info()
    return render_template('test.html', addresses=addresses)

答案 2 :(得分:0)

也许你可能需要

setNames(sapply(do.call(paste0, combinations_df ), 
         function(x) sum(do.call(paste0, my_data[-1])==x)), 1:nrow(combinations_df ))