我有一个大数据框,带有“正”(1)或“负”(0)数据点。
数据示例
my_data <- data.frame(cell = 1:4, marker_a = c(1, 0, 0, 0),
marker_b = c(0,1,1,1), marker_c = c(0,1,1,0), marker_d = c(0,1,0,1))
cell marker_a marker_b marker_c marker_d
1 1 1 0 0 0
2 2 0 1 1 1
3 3 0 1 1 0
4 4 0 1 0 1
...
我有一个不同的data.frame
,其中包含my_data$cell
可以拥有的正面和负面标记的所有可能组合
combinations_df <- expand.grid(
marker_a = c(0, 1),
marker_b = c(0, 1),
marker_c = c(0, 1),
marker_d = c(0, 1)
)
marker_a marker_b marker_c marker_d
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
如何获得data.frame
每行/组合与my_data的每一行匹配,并返回每个组合的最终计数
预期产出的例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 14969 15223 15300 14779 14844 16049 15374 15648 15045 15517 15116 15405 14990 15347 14432 15569
答案 0 :(得分:1)
我猜测data.table方式非常有效:
library(data.table)
setDT(my_data)
my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ]
marker_a marker_b marker_c marker_d N
1: 0 0 0 0 0
2: 1 0 0 0 1
3: 0 1 0 0 0
4: 1 1 0 0 0
5: 0 0 1 0 0
6: 1 0 1 0 0
7: 0 1 1 0 1
8: 1 1 1 0 0
9: 0 0 0 1 0
10: 1 0 0 1 0
11: 0 1 0 1 1
12: 1 1 0 1 0
13: 0 0 1 1 0
14: 1 0 1 1 0
15: 0 1 1 1 1
16: 1 1 1 1 0
如果您只关心数据中显示的组合,&#34;链&#34;过滤命令:
my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ][ N > 0 ]
marker_a marker_b marker_c marker_d N
1: 1 0 0 0 1
2: 0 1 1 0 1
3: 0 1 0 1 1
4: 0 1 1 1 1
或者,在这种情况下,您甚至不需要combinations_df
...
my_data[, .N, by = marker_a:marker_d ]
marker_a marker_b marker_c marker_d N
1: 1 0 0 0 1
2: 0 1 1 1 1
3: 0 1 1 0 1
4: 0 1 0 1 1
答案 1 :(得分:1)
你正在用“二进制”编写你的组合,所以不需要任何连接,只需要很少的数学。试试这个:
def get_info():
with open('/etc/network/interfaces', 'r+') as f:
for line in f:
found_address = line.find('address')
if found_address != -1:
address = line[found_address+len('address:'):]
print 'Address: ', address
found_network = line.find('network')
if found_network != -1:
network = line[found_network+len('network:'):]
print 'Network: ', network
found_netmask = line.find('netmask')
if found_netmask != -1:
netmask = line[found_netmask+len('netmask:'):]
print 'Netmask: ', netmask
found_broadcast = line.find('broadcast')
if found_broadcast != -1:
broadcast = line[found_broadcast+len('broadcast:'):]
print 'Broadcast: ', broadcast
return address
print get_info()
@app.route('/test')
def showPage():
addresses = get_info()
return render_template('test.html', addresses=addresses)
答案 2 :(得分:0)
也许你可能需要
setNames(sapply(do.call(paste0, combinations_df ),
function(x) sum(do.call(paste0, my_data[-1])==x)), 1:nrow(combinations_df ))