熊猫根据条件搜索字符串

时间:2019-08-26 09:04:05

标签: python pandas

问题有点复杂,要求查看快照以获得更好的视图/理解 我有2列“ Col-A”,“ Col-B” [https://i.stack.imgur.com/bw1hx.jpg][1]”的数据框。 我也有一个包含多列的CSV文件数据。[https://i.stack.imgur.com/v72mM.jpg][1]

我的数据框的“ Col-B”数据将与CSV文件标题匹配, 例如,“ Col-B”的第一行项目是“ Password”,因此我将在CSV文件中使用名为“ Password”的列。[https://i.stack.imgur.com/hTCZa.jpg][1]

现在我的代码应该执行的是,如果我的数据框“ Col-B”是“密码”,则应该在“ Col-A”中搜索我的CSV文件的密码列,并且首先找到的字符串是我的输出。下面是我尝试的代码。

import pandas as pd
import numpy as np

data = pd.read_excel("C:/Users/606736.CTS/Desktop/Keyword.xlsx", 
sheet_name='Sheet2')
CSV_file = pd.read_excel("C:/Users/606736.CTS/Desktop/Keyword.xlsx",
sheet_name='Sub-Cat') 

data['Col-C']= np.nan # for adding a new column

# Below code works perfectly fine for searching any one of the column 
# in the CSV-file, in the below code I am searching on "Password" Col, 
# but I want the code to take the column dynamically based on the 'Col-B' 
# of my dataframe.
# if col-B of my dataframe is "CPU", then 'CPU' column of the CSV-file 
# should be searched.
for i in data['Col-B']:
    for Key1 in CSV_file[i]:
        data.loc[(data['Col-A'].apply(lambda x: Key1 in x.split(' ')) & 
        (data['Col-C'].isna()), 'Col-C')] = Key1
data.head(3)

2 个答案:

答案 0 :(得分:0)

如果您的数据帧较大,这将需要很长时间才能运行

patterns.txt

答案 1 :(得分:0)

这对我来说很好

for i in data['Col-B']:
    for Key1 in CSV_file[i]:
       data.loc[(data['Col-A'].apply(lambda x: Key1 in x.split(' ')) & 
       (data['Col-B']==i), 'Col-C')] = Key1