Question

只要col2有关键字，我都想提取col1中的时间戳。

keywords=["i can help you with that", "i can surely help you with that", "i can check and help you with that", "i will be more than happy to help you", "let me assist you on this", "to assist you better"]

给出excel数据，

    col1                                                                                                                            
1.agent enters(as arrin)
2.
3.I'll be happy to assist you. Give me a moment to review your request.
4.I see that the light in your Modem is Blinking Red. Am I right ?
5.Thank you for the detailed information.
6.Please do not worry.
7.Don't worry johny. I can help you with that.
8.Let me connect this chat to the concern team to help you out with this, 
  Please stay connected.

   col2
1. 2018-10-14 21:16:58
2. 2018-10-14 21:17:00
3. 2018-10-14 21:17:40
4. 2018-10-14 21:18:25
5. 2018-10-14 21:19:39
6. 2018-10-14 21:19:43
7. 2018-10-14 21:21:04
8. 2018-10-14 21:22:00

例如，第7行中存在一个关键字，因此应提取col2中的相应时间戳。

输出应如下所示

[out]: 2018-10-14 21:21:04

谢谢。

Answer 1

这应该有效。

全部更改为大写或小写，因为这将区分大小写。注意，因为标点符号可能也要处理

import pandas as pd

keywords=["i can help you with that", "i can surely help you with that", "i can check and help you with that", "i will be more than happy to help you", "let me assist you on this", "to assist you better"]

############## Read in excel file ##########################
col1 = ["agent enters(as arrin)",
"",
"I'll be happy to assist you. Give me a moment to review your request.",
"I see that the light in your Modem is Blinking Red. Am I right ?",
"Thank you for the detailed information.",
"Please do not worry.",
"Don't worry johny. I can help you with that.",
"Let me connect this chat to the concern team to help you out with this, Please stay connected."]

col2 = ['2018-10-14 21:16:58',
'2018-10-14 21:17:00',
'2018-10-14 21:17:40',
'2018-10-14 21:18:25',
'2018-10-14 21:19:39',
'2018-10-14 21:19:43',
'2018-10-14 21:21:04',
'2018-10-14 21:22:00']

df = pd.DataFrame()
df['col1'] = col1
df['col2'] = col2

#####################################################

# lower case keywords and col1 strings
lower_keywords = [x.lower() for x in keywords]
df['low_col1'] = df['col1'].str.lower()

df_filter = df[df['low_col1'].str.contains('|'.join(lower_keywords))]

print (df_filter['col2'])

输出：

In  [38]: print (df_filter['col2'])
Out [38]: 6    2018-10-14 21:21:04
          Name: col2, dtype: object

Answer 2

给出

($project)

您可以运行：

{
    "_id" : ObjectId("5c11efebd9cb4d35f47d6bd0"),
    "schoolID" : "123",
    "name" : "Abd"
}

keywords=[ "i can help you with that", "i can surely help you with that", "i can check and help you with that", "i will be more than happy to help you", "let me assist you on this", "to assist you better" ] col1 = [ "agent enters(as arrin)", "", "I'll be happy to assist you. Give me a moment to review your request.", "I see that the light in your Modem is Blinking Red. Am I right ?", "Thank you for the detailed information.", "Please do not worry.", "Don't worry johny. I can help you with that.", "Let me connect this chat to the concern team to help you out with this, Please stay connected." ] col2 = [ '2018-10-14 21:16:58', '2018-10-14 21:17:00', '2018-10-14 21:17:40', '2018-10-14 21:18:25', '2018-10-14 21:19:39', '2018-10-14 21:19:43', '2018-10-14 21:21:04', '2018-10-14 21:22:00' ]使您能够简洁地测试字符串中是否有任何个关键字。

根据您的需要，您可能需要在搜索之前将字符串转换为小写，或在for i, col in enumerate(col1): if any([keyword in col for keyword in keywords]): print(col2[i])中执行类似以下操作：

any

如果相应的行具有关键字

2 个答案: