使用带有csv文件的关键字过滤数据

时间:2018-04-14 21:49:03

标签: python list csv dictionary

我正在尝试过滤掉csv文件中的数据,我试图以一种看起来像这样的方式来组织它

0 AIG,10,,,,Yes,,,Jr,,,MS,,
1 Baylor College of Medicine,19,Yes,Yes,,,,,,,,,,Recent
2 CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
3 Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
4 ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
5 Flow-Cal Inc.,16,Yes,,,Yes,,,Jr,Sr,,,,All
6 Global Shop Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
7 Harris County CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
8 HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
9 Hitachi Consulting,13,Yes,,,,,,,Sr,,MS,,
10 HP Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
11 INT Inc.,20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
12 JPMorgan Chase & Co,3,Yes,,,Yes,,,Jr,Sr,,,,
13 Leidos,390,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,
14 McKesson,26,Yes,,,,,,,Sr,,,,
15 MRE Consulting Ltd.,2,Yes,,,,,,,Sr,PB,MS,,All
16 NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
17 PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
18 San Jacinto College ,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
19 SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
20 Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
21 Sogeti USA,15,Yes,,,,,,,Sr,PB,MS,,
22 Southwest Research Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
23 The Reynolds and Reynolds Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
24 UH Enterprise Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
25 U.S. Marine Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
26 ValuD Consuting LLC,5,Yes,,,,,,,Sr,PB,,,All
27 Wipro,24,Yes,,,,,,,Sr,PB,,,

但是,我的代码现在正在给我这个

0 AIG,10,,,,Yes,,,Jr,,,MS,,
1 Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
2 CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
3 Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
4 ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
5 HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
6 Leidos,390,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,
7 McKesson,26,Yes,,,,,,,Sr,,,,
8 NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
9 PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
10 SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
11 Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
12 Wipro,24,Yes,,,,,,,Sr,PB,,,
13 SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
14 NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
15 Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
16 AIG,10,,,,Yes,,,Jr,,,MS,,
17 ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
18 CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
19 Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
20 PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
21 Wipro,24,Yes,,,,,,,Sr,PB,,,
22 McKesson,26,Yes,,,,,,,Sr,,,,
23 Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
24 HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
25 Leidos,30,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,

正如您所看到的,它似乎重复了我使用的某些关键字,我将在下面发布我的代码。

#I made a dictonary of the problem stated
company_dict = {0:"Company", 1:"Booth",
                2:"Full-Time", 3:"Full-Time Visa Sponsor",
                4:"Part-Time", 5:"Internship",
                6:"Freshman", 7:"Sophomore",
                8:"Junior", 9:"Senior",
                10:"Post-Bacs", 11:"MS",
                12:"PhD", 13:"Alumni"}

#Loop to organize the company_dict
for lines in company_dict:
    print(repr(lines),company_dict[lines])

keywords = ("AIG","Baylor","CGG","Citi","ExxonMobil","Flow-Cal Inc.",
           "Global SHop Solutions","Harris Count CTS","HCSS",
           "Hitachi Consulting", "HP Inc.","INT Inc.","JPMorgan Chase & Co",
           "Leidos","McKesson","MRE Consulting Ltd.","NetIQ","PROS",
           "San Jacinto College","SAS","Smartbridge","Sogeti USA",
           "Southwest Research Institute","The Reynolds and Reynolds Company",
           "UH Enterprise Systems","U.S. Marine Corps","ValuD Consuting LLC","Wipro")

with f as filterf:
    output_line_counter = 0
    for line in filterf:
        if any(keyword in line for keyword in keywords):
            print(output_line_counter, line.strip())
            output_line_counter += 1

这一切都来自作业中包含的csv文件。我认为我走在正确的轨道上,但我不明白为什么我的代码会给我重复,也错过了关键字'我要求它搜索。

我将在下面包含csv数据

ALPHABETICAL ORDER,,,,,,,,,,,,,
,,Positions,,,,Classifications,,,,,,,
Company,Booth,Full-Time,"Full-Time Visa Sponsor",Part-Time,Internship,Freshman,Sophomore,Junior,Senior,Post-Bacs,MS,PhD,Alumni
AIG,10,,,,Yes,,,Jr,,,MS,,
Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
,...
Flow-Cal�Inc.,16,Yes,,,Yes,,,Jr,Sr,,,,All
Global�Shop�Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
Harris�County�CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Hitachi�Consulting,13,Yes,,,,,,,Sr,,MS,,
HP�Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
INT�Inc.,20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
JPMorgan�Chase�&�Co,3,Yes,,,Yes,,,Jr,Sr,,,,
Leidos,390,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,
McKesson,26,Yes,,,,,,,Sr,,,,
,,,,,,,,,,,,,
MRE�Consulting�Ltd.,2,Yes,,,,,,,Sr,PB,MS,,All
NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
San�Jacinto�College��,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
Sogeti�USA,15,Yes,,,,,,,Sr,PB,MS,,
Southwest�Research�Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
The�Reynolds�and�Reynolds�Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
UH�Enterprise�Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
U.S.�Marine�Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
ValuD�Consuting�LLC,5,Yes,,,,,,,Sr,PB,,,All
Wipro,24,Yes,,,,,,,Sr,PB,,,
BOOTH ORDER,,,,,,,,,,,,,
,Booth,Positions,,,,Classifications,,,,,,,
Company,#,Full-Time,"Full-Time
Visa Sponsor",Part-Time,Internship,Freshman,Sophomore,Junior,Senior,Post-Bacs,MS,PhD,Alumni
HP�Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
"MRE�Consulting,�Ltd.",2,Yes,,,,,,,Sr,PB,MS,,All
JPMorgan�Chase�&�Co,3,Yes,,,Yes,,,Jr,Sr,,,,
SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
ValuD�Consuting�LLC,5,Yes,,,,,,,Sr,PB,,,All
NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
UH�Enterprise�Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
AIG,10,,,,Yes,,,Jr,,,MS,,
ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
Southwest�Research�Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
Hitachi�Consulting,13,Yes,,,,,,,Sr,,MS,,
San�Jacinto�College��,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
Sogeti�USA,15,Yes,,,,,,,Sr,PB,MS,,
"Flow-Cal,�Inc.",16,Yes,,,Yes,,,Jr,Sr,,,,All
CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
Global�Shop�Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
Baylor�College�of�Medicine,19,Yes,Yes,,,,,,,,,,Recent
"INT,�Inc.",20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
Harris�County�CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
The�Reynolds�and�Reynolds�Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
Wipro,24,Yes,,,,,,,Sr,PB,,,
U.S.�Marine�Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
McKesson,26,Yes,,,,,,,Sr,,,,
Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Leidos,30,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,

我认为它必须对csv文件框中的问号做些什么,但我不确定。我想从我给它的关键字中搜索csv文件,并打印该行。非常感谢您的任何意见或建议:)

1 个答案:

答案 0 :(得分:0)

答案是cvs文件只需要更改(希望它对项目有好处,它有奇怪的UTF错误)

我还添加了以下代码

DataList = []
with f as filterf:
    output_line_counter = 0
    for line in filterf:
        if any(keyword in line for keyword in keywords):
            output_line_counter += 1
            DataList.append(line)

CleanerData = sorted(set(DataList))
line_counter = 0
for i in CleanerData:
    line_counter += 1
    print(line_counter, i, end='')