我有一个大约1000行的.csv文件,如下所示:
id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,cbegwell0@google.ca,Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,ncheatle1@msn.com,Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,lhamlen2@dot.gov,Female,52.165.62.174,24/04/1990
4,Gawen,Gillfillan,ggillfillan3@hp.com,Male,83.249.190.232,31/10/1984
5,Syd,Gilfether,sgilfether4@china.com.cn,Male,180.153.199.106,11/07/1995
到目前为止,我对代码的要求是输入,然后遍历每一行并打印包含输入的行。看起来像这样:
import csv
# Asks for search criteria from user
search = input("Enter search criteria:\n")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
if search in row:
print(row)
我想要的最终结果,以及我所坚持的是,能够输入更多的一个搜索条件,分别用“,”,它将搜索并打印这些行。有点像过滤列表的方式。
表示如果文件中有多个“David”是“男性”。我可以进入:大卫,男性
然后它将打印匹配的所有行,但忽略具有“女性”的“David”的行。
答案 0 :(得分:1)
您可以在逗号上拆分输入,然后检查以确保输入中的每个字段都使用all()
和列表推导在给定行中。
此示例使用输入的简单分割,并且不关心每个输入匹配的字段。如果您只想匹配特定列,请使用csv.DictReader
代替csv.reader
。
import csv
# Asks for search criteria from user
search_parts = input("Enter search criteria:\n").split(",")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
if all([x in row for x in search_parts]):
print(row)
答案 1 :(得分:0)
如果您乐意使用第三方库,可以使用pandas
。
我稍微修改了您的数据,以演示一个简单的查询。
import pandas as pd
from io import StringIO
mystr = StringIO("""id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,cbegwell0@google.ca,Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,ncheatle1@msn.com,Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,lhamlen2@dot.gov,Female,52.165.62.174,24/04/1990
4,David,Gillfillan,ggillfillan3@hp.com,Male,83.249.190.232,31/10/1984
5,David,Gilfether,sgilfether4@china.com.cn,Male,180.153.199.106,11/07/1995""")
# replace mystr with 'file.csv'
df = pd.read_csv(mystr)
# retrieve user inputs
first_name = input('Input a first name\n:')
gender = input('Input a gender, Male or Female\n:')
# calculate Boolean mask
mask = (df['first_name'] == first_name) & (df['gender'] == gender)
# apply mask to result
res = df[mask]
print(res)
# id first_name last_name email gender \
# 3 4 David Gillfillan ggillfillan3@hp.com Male
# 4 5 David Gilfether sgilfether4@china.com.cn Male
# ip_address birthday
# 3 83.249.190.232 31/10/1984
# 4 180.153.199.106 11/07/1995
答案 2 :(得分:0)
虽然你可以检查一行中是否存在字符串"David"
和"Male"
,但如果需要检查列值,则不会非常精确。而是通过csv
读取数据并创建存储搜索值和标题名称的namedtuple
对象列表:
from collections import namedtuple
import csv
data = list(csv.reader(open('filename.csv')))
search = namedtuple('search', 'value,header')
searches = [search(i, data[0].index(b)) for i, b in zip(input().split(', '), ['first_name', 'gender'])]
final_results = [i for i in data if all(c.value == i[c.header] for c in searches)]