我的数据如下所示:
$rest = "";
$re = '~\b(https?://)www\.myweb\.com/(\S+\b)~';
$str = "Some text https://www.myweb.com/page/cat/323123442321-rghe432, another http://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr";
echo $result = preg_replace_callback($re, function ($m) use (&$rest) {
$rest = $m[2];
return $m[1] . "embed.myweb.com/" . $m[2];
}, $str, 1) . PHP_EOL;
//-LIMIT ^ - HERE -
echo $rest;
source target
time
0.5 96253 94861
1.0 96652 95091
1.5 94861 95091
2.5 95091 95409
3.5 95409 97221
4.5 97221 96781
5.5 96781 97707
6.5 97707 98191
7.5 98191 99096
8.5 99096 100016
8.5 99096 100013
9.5 100013 98663
9.5 100016 98658
10.5 98658 99573
10.5 98663 99589
11.5 99589 100506
11.5 99573 100490
和source
列中的每个整数都会引用一个点。target
索引是指可以找到链接的时间点。智能算法将找到数据集中包含的所有可能轨迹。
例如,在前面的示例中,存在4个轨迹:
time
这个问题可以恢复为图论问题。请参见下图,该图对应于开头显示的数据。
这个想法是在这个图中找到所有可能的路径 一个关于时间逻辑的约束:一个轨迹是一个有序的点(节点)列表,它只能来自[96253, 94861, 95091, 95409, 97221, 97221, 96781, 97707, 98191, 99096, 100016, 98658, 99573, 100490]
[96652, 95091, 95409, 97221, 97221, 96781, 97707, 98191, 99096, 100016, 98658, 99573, 100490]
[96253, 94861, 95091, 95409, 97221, 97221, 96781, 97707, 98191, 99096, 100013, 98663, 99589, 100506]
[96652, 95091, 95409, 97221, 97221, 96781, 97707, 98191, 99096, 100013, 98663, 99589, 100506]
到t
(不能过去)。
该算法将在Python中实现。所以允许任何Python技巧: - )
答案 0 :(得分:0)
应用图论算法似乎是解决这个问题的明智方法。我使用了networkx
python库。
print(spot_ids)
输出:
source target
time
0.5 96253 94861
1.0 96652 95091
1.5 94861 95091
2.5 95091 95409
3.5 95409 97221
4.5 97221 96781
5.5 96781 97707
6.5 97707 98191
7.5 98191 99096
8.5 99096 100016
8.5 99096 100013
9.5 100013 98663
9.5 100016 98658
10.5 98658 99573
10.5 98663 99589
11.5 99589 100506
11.5 99573 100490
算法:
import itertools
import networkx as nx
# Build graph
graph = nx.Graph()
for t, spot in spot_ids.iterrows():
graph.add_edge(int(spot['source']), int(spot['target']), attr_dict=dict(t=t))
# Find graph extremities by checking if number of neighbors is equal to 1
tracks_extremities = [node for node in graph.nodes() if len(graph.neighbors(node)) == 1]
tracks_extremities
paths = []
# Find all possible paths between extremities
for source, target in itertools.combinations(tracks_extremities, 2):
# Find all path between two nodes
for path in nx.all_simple_paths(graph, source=source, target=target):
# Now we need to check wether this path respect the time logic contraint
# edges can only go in one direction of the time
# Build times vector according to path
t = []
for i, node_srce in enumerate(path[:-1]):
node_trgt = path[i+1]
t.append(graph.edge[node_srce][node_trgt]['t'])
# Will be equal to 1 if going to one time direction
if len(np.unique(np.sign(np.diff(t)))) == 1:
paths.append(path)
for path in paths:
print(path)
输出:
[100490, 99573, 98658, 100016, 99096, 98191, 97707, 96781, 97221, 95409, 95091, 96652]
[100490, 99573, 98658, 100016, 99096, 98191, 97707, 96781, 97221, 95409, 95091, 94861, 96253]
[96652, 95091, 95409, 97221, 96781, 97707, 98191, 99096, 100013, 98663, 99589, 100506]
[100506, 99589, 98663, 100013, 99096, 98191, 97707, 96781, 97221, 95409, 95091, 94861, 96253]