对我的上一个问题进行跟进。所以我在.csv文件中得到了这样的数据:
id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,cbegwell0@google.ca,Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,ncheatle1@msn.com,Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,lhamlen2@dot.gov,Female,52.165.62.174,24/04/1990
4,Gawen,Gillfillan,ggillfillan3@hp.com,Male,83.249.190.232,31/10/1984
5,Syd,Gilfether,sgilfether4@china.com.cn,Male,180.153.199.106,11/07/1995
我想要的是,当python程序运行时,它会询问用户要搜索的关键字。然后输入所有关键字(可能它们存储在列表中),然后打印出包含所有关键字的所有行,无论该关键字位于哪个列。
我一直在玩csv和pandas,并且已经谷歌搜索了几个小时,但似乎无法像我想要的那样让它工作。我还是python3的新手。请帮忙。
**编辑以显示我到目前为止所获得的内容: import csv
# Asks for search criteria from user
search_parts = input("Enter search criteria:\n").split(",")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
if all([x in row for x in search_parts]):
print(row)
如果只搜索一个关键字,效果会很好。但我希望选择过滤一个或多个关键字。
答案 0 :(得分:0)
在这里,使用try和except,因为如果数据类型与您的关键字不匹配,则会引发错误
import pandas as pd
def fun(data,keyword):
ans = pd.DataFrame()
for i in data.columns:
try:
ans = pd.concat((data[data[i]==keyword],ans))
except:
pass
ans.drop_duplicates(inplace=True)
return ans
答案 1 :(得分:0)
使用以下关键字AND
搜索以下代码:
def AND_serach(df,list_of_keywords):
# init a numpy array to store the index
index_arr = np.array([])
for keyword in list_of_keywords:
# drop the nan if entire row is nan and get remaining rows' indexs
index = df[df==keyword].dropna(how='all').index.values
# if index_arr is empty then assign to it; otherwise update to intersect of two arrays
index_arr = index if index_arr.size == 0 else np.intersect1d(index_arr,index)
# get back the df by filter the index
return df.loc[index_arr.astype(int)]
使用以下关键字OR
搜索以下代码:
def OR_serach(df,list_of_keywords):
index_arr = np.array([])
for keyword in list_of_keywords:
index = df[df==keyword].dropna(how='all').index.values
# get all the unique index
index_arr = np.unique(np.concatenate((index_arr,index),0))
return df.loc[index_arr.astype(int)]
<强>输出强>
d = {'A': [1,2,3], 'B': [10,1,5]}
df = pd.DataFrame(data=d)
print df
A B
0 1 10
1 2 1
2 3 5
keywords = [1,5]
AND_serach(df,keywords) # return nothing
Out[]:
A B
OR_serach(df,keywords)
Out[]:
A B
0 1 10
1 2 1
2 3 5