通过多列和打印行过滤数据?

时间:2018-04-16 02:54:16

标签: python pandas

对我的上一个问题进行跟进。所以我在.csv文件中得到了这样的数据:

 id,first_name,last_name,email,gender,ip_address,birthday
 1,Ced,Begwell,cbegwell0@google.ca,Male,134.107.135.233,17/10/1978
 2,Nataline,Cheatle,ncheatle1@msn.com,Female,189.106.181.194,26/06/1989  
 3,Laverna,Hamlen,lhamlen2@dot.gov,Female,52.165.62.174,24/04/1990
 4,Gawen,Gillfillan,ggillfillan3@hp.com,Male,83.249.190.232,31/10/1984
 5,Syd,Gilfether,sgilfether4@china.com.cn,Male,180.153.199.106,11/07/1995

我想要的是,当python程序运行时,它会询问用户要搜索的关键字。然后输入所有关键字(可能它们存储在列表中),然后打印出包含所有关键字的所有行,无论该关键字位于哪个列。

我一直在玩csv和pandas,并且已经谷歌搜索了几个小时,但似乎无法像我想要的那样让它工作。我还是python3的新手。请帮忙。

**编辑以显示我到目前为止所获得的内容: import csv

# Asks for search criteria from user
search_parts = input("Enter search criteria:\n").split(",")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
    if all([x in row for x in search_parts]):
        print(row)

如果只搜索一个关键字,效果会很好。但我希望选择过滤一个或多个关键字。

2 个答案:

答案 0 :(得分:0)

在这里,使用try和except,因为如果数据类型与您的关键字不匹配,则会引发错误

import pandas as pd
def fun(data,keyword):
    ans = pd.DataFrame()
    for i in data.columns:
        try:
            ans = pd.concat((data[data[i]==keyword],ans))
        except:
            pass
    ans.drop_duplicates(inplace=True)
    return ans

答案 1 :(得分:0)

使用以下关键字AND搜索以下代码:

def AND_serach(df,list_of_keywords):
    # init a numpy array to store the index
    index_arr = np.array([]) 
    for keyword in list_of_keywords:
        # drop the nan if entire row is nan and get remaining rows' indexs
        index = df[df==keyword].dropna(how='all').index.values
        # if index_arr is empty then assign to it; otherwise update to intersect of two arrays
        index_arr = index if index_arr.size == 0 else np.intersect1d(index_arr,index)
    # get back the df by filter the index
    return df.loc[index_arr.astype(int)]

使用以下关键字OR搜索以下代码:

def OR_serach(df,list_of_keywords):
    index_arr = np.array([]) 
    for keyword in list_of_keywords:
        index = df[df==keyword].dropna(how='all').index.values
        # get all the unique index
        index_arr = np.unique(np.concatenate((index_arr,index),0))
    return df.loc[index_arr.astype(int)]

<强>输出

d = {'A': [1,2,3], 'B': [10,1,5]}
df = pd.DataFrame(data=d)
print df
   A   B
0  1  10
1  2   1
2  3   5

keywords = [1,5]
AND_serach(df,keywords) # return nothing
Out[]:
    A   B

OR_serach(df,keywords)
Out[]: 
    A   B
0   1   10
1   2   1
2   3   5