我正在尝试使用Slack的API,它发送用户名的字符串,例如:<@ UCH65RHRC>
因此,在API JSON主体的文本中,一行中可能包含上述几种模式,例如:
“嗨,<@ UCH65RHRC>和<@ UCH65RHRF>,感谢您所做的一切!”
如何使用Python的正则表达式查找具有此预定义模式的所有匹配字符串,即:<@ ##########,其中#(共9个)可以是0-9和AZ ?
答案 0 :(得分:1)
这是非常简单的任务。正则表达式def skill_graph_from_df(self, sx_dataframe, path_of_existing=""):
"""Builds directed graph from data frame, where the weight of the edges is the confidence, as used in associaton analysis.
:param sx_dataframe: Pandas Dataframe - columns: tags, postid, page, alltext.
:param path_of_existing: str - path of an existing skill graph in GraphML format.
New data is added to this graph. New graph is built if string is empty.
:return: void
"""
self.df_all = sx_dataframe
self.pagelist = self.df_all.page.unique()
len_df = len(self.df_all)
# directed graph with confidence of the rule keyword 1 => keyword 2 as weight for edges (google association analysis for explanation)
if path_of_existing is not "":
# import GraphML graph
self.read_graph(path_of_existing)
self.keywords_di.graph['pages'] = self.keywords_di.graph['pages'] + ", " + ", ".join(self.pagelist)
else:
self.keywords_di.graph['pages'] = ", ".join(self.pagelist)
for i in range(len_df):
taglist = nltk.word_tokenize(self.df_all.iloc[i, 0])
pairs = findsubsets(taglist, 2) # pairs of keywords
for word in taglist: # adds nodes
if word in self.keywords_di.nodes:
self.keywords_di.nodes[word]['count'] += 1
else:
self.keywords_di.add_node(word, count=1)
for pair in pairs: # adds edges
if pair in self.keywords_di.edges:
self.keywords_di.edges[pair]['paircount'] += 1
self.keywords_di.edges[pair[::-1]]['paircount'] += 1
else:
self.keywords_di.add_edge(*pair, paircount=1)
self.keywords_di.add_edge(*pair[::-1], paircount=1)
for node in self.keywords_di:
for edge in self.keywords_di.out_edges([node]):
self.keywords_di.edges[edge]['confidence'] = self.keywords_di.edges[edge]['paircount'] / self.keywords_di.nodes[node]['count']
应该符合您的要求。例如:
<@([0-9A-Z]{9})>
这将提供以下输出:
import re
body = "Hi <@UCH65RHRC> and <@UCH65RHRF>, thanks for all the great work!"
id_search = re.findall("<@([0-9A-Z]{9})>", body)
for id in id_search:
print(id)