如何从一个列表中删除所有不使用正则表达式从另一个列开始的行?

时间:2015-06-10 07:58:13

标签: regex

对不起,如果标题有点复杂,我发现这个特殊的问题很难说。基本上,我有两个列表,我正在尝试使用Notepad ++修改第一个包含第二个列表的信息。

LIST 1(不是整个清单):

2000,4031161,1,1,1008,1000000
2000,4031162,1,1,1008,1000000
100100,4000019,1,1,0,600000
100100,2000000,1,1,0,20000
100100,2040002,1,1,0,300
100100,2041001,1,1,0,300
100100,2060000,1,1,0,30000
100100,4010000,1,1,0,9000
100100,4020000,1,1,0,9000
100100,2061000,1,1,0,30000
100100,1002067,1,1,0,1500
100100,2010009,1,1,0,20000
100100,2380000,1,1,0,1000
100100,0,4,6,0,400000
100101,4000000,1,1,0,600000
100101,2041006,1,1,0,300
100101,2000000,1,1,0,20000
100101,4020001,1,1,0,9000
100101,2060000,1,1,0,30000
100101,4010001,1,1,0,9000
100101,2061000,1,1,0,30000
100101,1040013,1,1,0,800
100101,1041012,1,1,0,800
100101,1060004,1,1,0,800
100101,1040017,1,1,0,800
100101,1060013,1,1,0,800
100101,2010009,1,1,0,20000
100101,2380001,1,1,0,1000
100101,0,8,12,0,400000
100120,0,1,5,0,400000
100121,0,10,14,0,400000
100121,4000483,1,1,0,400000
100130,4000493,1,1,0,600000
100130,2010000,1,1,0,20000
100130,2010009,1,1,0,20000
100130,4010005,1,1,0,9000
100130,4020005,1,1,0,9000
100130,2040003,1,1,0,300
100130,1002008,1,1,0,1500
100130,1040010,1,1,0,800
100130,1041004,1,1,0,800
100130,1060007,1,1,0,800
100130,2380015,1,1,0,1000
100131,4000494,1,1,0,600000
100131,2000000,1,1,0,20000
100131,2010009,1,1,0,20000
100131,4010006,1,1,0,9000
100131,4020006,1,1,0,9000
100131,2040400,1,1,0,300
100131,2040618,1,1,0,300
100131,1002019,1,1,0,1500
100131,1002002,1,1,0,1500
100131,1040013,1,1,0,800
100131,1041012,1,1,0,800
100131,1060004,1,1,0,800
100131,1072005,1,1,0,800
100131,2380016,1,1,0,1000
100132,4000495,1,1,0,600000
100132,2000000,1,1,0,20000
100132,2010009,1,1,0,20000
100132,4010000,1,1,0,9000
100132,4020007,1,1,0,9000
100132,2040823,1,1,0,300
100132,2041018,1,1,0,300
100132,1002001,1,1,0,1500
100132,1002003,1,1,0,1500
100132,1040014,1,1,0,800
100132,1040015,1,1,0,800
100132,1060008,1,1,0,800
100132,1041014,1,1,0,800
100132,1061014,1,1,0,800
100132,1072004,1,1,0,800
100132,1082003,1,1,0,1000
100132,1442000,1,1,0,700
100132,2380017,1,1,0,1000
100133,4000496,1,1,0,600000
100133,2000000,1,1,0,20000
100133,2010009,1,1,0,20000
100133,4010001,1,1,0,9000
100133,4020003,1,1,0,9000
100133,2048000,1,1,0,300
100133,2041004,1,1,0,300
100133,1002041,1,1,0,1500
100133,1002007,1,1,0,1500
100133,1032001,1,1,0,1000
100133,1040038,1,1,0,800
100133,1060028,1,1,0,800
100133,1041064,1,1,0,800

列表2(不是整个列表):

2000,4031161
2000,4031162
100130,2040003
100131,2040400
100133,2048000
100134,2040500
100134,2044400
130101,4031846
210100,4031273
851000,2290132
1110100,4020002
1110100,4031146
1110130,4000012
1110130,2043102
1110130,1092008
1110130,2048000
1110130,1002033
1110130,1302007
1110130,1032001
1110130,1412012
1110130,4032316
1130100,4031147
1140130,2048001
1140130,1412012
1140130,2044802
1210100,4031846
1210100,4032340
1210102,4032314
2100100,4020006
2100100,4010001
2100100,4010007
2100100,2040420
2100100,2049000
2100101,4010006
2100101,4020001
2100101,4010007
2100101,2044210
2100102,2043212
2100103,2044314
2100104,1452022
2100105,2040316
2100105,2040319
2100105,2044412
2100106,2040926
2100107,1382009
2100108,4010002
2100108,4010001
2100108,4010007
2100108,2044014
2100108,2044214
2110200,2043214
2110200,1452016
2110200,4032390
2110300,2043214
2110301,4010002
2110301,2043114
2130100,2044012
2130100,2044210
2130103,2040617
2220000,4010000
2220000,4020000
2220100,4020006
2230100,4020007
2230100,2040823
2230100,2044010
2230101,4010003
2230102,4031155
2230102,4007001
2230102,1462014
2230103,2040319
2230103,2044114
2230103,1382009
2230104,2040929
2230104,2043112
2230104,1452016
2230105,2040617
2230105,2043015
2230105,4031259
2230106,2040417
2230106,4031268
2230106,4031260
2230106,4031269
2230107,1092030
2230108,2040623
2230108,4031261
2230109,4010004
2230109,4031264
2230110,2044312
2230110,2044805
2230110,1472030
2230111,2049000
2230131,4000008
2230131,1050031
2230200,4031262
2300100,2043112
3000000,2040316
3000000,2040620
3000001,4000068
3000001,2000001
3000001,2000003
3000001,4020004
3000001,4010002
3000001,2050000
3000001,2050001
3000001,2050002
3000001,2050003
3000001,2050004
3100101,4010005
3100101,4020000
3100101,4010007
3100101,4130005
3100101,4130009
3100101,1332025
3110100,4130002
3110100,4130008
3110100,4130010
3110100,4007005
3110101,2044012
3110101,4130002
3110102,4131002
3110102,2044210
3110102,4130003
3110102,4130004
3110102,4130011
3110102,4031129
3110102,4007007
3110102,4007001
3110102,1302030
3110300,2040530
3110300,2044410
3110300,4130002
3110300,4130009
3110300,4130013
3110300,4007000
3110300,4007004
3110301,4010005
3110301,4020000
3110301,4010007
3110301,2040420
3110301,4130001
3110301,4130006
3110302,2040324
3110302,2044210
3110302,4130010
3110302,4130015
3110302,4031694
3110302,4007003
3110303,2040417
3110303,2044112
3110303,2044310
3110303,2044809
3110303,4130001
3110303,4130002
3110303,4130016
3110303,4031694
3110303,1472030
3210100,4010002
3210100,4130011
3210100,4130016
3210100,4130017
3210100,4007003
3210100,4007001
3210100,1382009
3210200,4130007
3210200,4130016
3210200,4007000
3210200,4007006
3210200,4007001
3210201,2043114
3210201,4130003
3210201,4130004
3210201,4130012
3210202,2043110
3210202,2044807
3210202,4130006
3210202,4130012
3210203,2040923
3210203,2043212
3210204,2040617
3210204,4130015
3210204,4130017
3210205,4130001
3210205,4130004
3210205,4130014
3210205,4031093
3210205,4007007
3210205,4007005
3210205,1412011
3210206,4130015
3210206,4130016
3210207,2049000
3210207,4130007
3210207,4130008
3210208,4130006
3210208,4130008
3210208,2382028
3210208,4031279
3210208,4007002
3210208,4007004
3210208,1452022
3210450,4130000
3210450,4130014
3210450,4130017
3210450,4007004
3210800,4020004
3210800,2044414
3210800,4130008
3210800,4130010
3210800,1452022
3220000,1322027
3220000,2044112
3220000,2044412
3230100,4130006
3230100,4130012
3230100,4130017
3230100,4031239
3230101,4130007
3230101,4130014
3230101,4007000
3230101,4007003
3230102,2040024
3230102,2040423
3230102,4130011
3230102,4130015
3230103,2044112
3230103,4130001
3230103,4130011
3230104,2044212
3230104,4130000
3230104,4130003
3230104,4130005
3230104,4031263
3230200,2044807
3230200,4130009
3230200,4130014
3230200,4031309
3230200,4007000
3230200,1432012
3230300,4000067
3230300,2000002
3230300,2000003
3230300,4020000
3230300,4010001
3230300,4004000
3230300,4004001
3230300,4004002
3230300,4004003
3230302,4130005
3230302,4130012
3230302,4130013
3230302,4031089
3230302,1422014
3230303,2044312
3230303,4130009
3230303,4130010
3230303,4130012
3230304,4010001
3230304,2040316
3230304,2049000
3230304,4130002
3230304,4130017
3230305,2040926
3230305,4130003
3230305,4130004
3230305,4130014
3230306,4130000
3230306,4130010
3230306,4007000
3230306,4007005
3230306,1472032
3230307,4010001
3230307,2040929
3230307,2044110
3230307,4130010
3230307,4130013
3230308,2043210
3230308,4130004
3230308,4130006
3230308,4130015
3230400,4130001
3230400,4130008
3230400,4031140
3230400,4031135
3230400,4007002
3230400,4007004
3230405,4131005
3230405,2044410
3230405,4130009
3230405,4130013
3300001,4130005
3300001,4130009
3300005,2043801
3300005,2044801
3300006,2040602
3300006,1041076
3300006,1072126
3300007,2040001
3300007,2040301
3300007,2043701
3300007,2043801
3300007,2040601
3300007,1041033
3300007,2040302
3300007,2044801
3300008,2040301
3300008,2043801
3300008,2044802
4110300,4130002
4110300,4130013
4110300,2382057
4110300,4007004
4110301,4130007
4110301,4130012
4110301,2382072
4110301,4007000
4110301,4007005
4110301,4007006
4110302,2000002
4110302,2000003
4110302,4020000
4110302,4020006
4110302,4130012
4110302,2044102
4110302,1372007
4110302,4006001
4110302,1040089
4110302,1050045
4110302,4004002
4110302,2040001
4110302,4000359
4110302,1082198
4110302,4007006
4110302,4007001
4130100,2040025
4130100,2040621
4130100,2044014

所以,基本上,我需要删除LIST 1中不是以LIST 2中的一行开头的每一行。例如,LIST 2的第一行是“2000,4031161”,所以我不想删除“2000,4031161,1,1,1008,1000000”。 LIST 1中的一行是“100100,4000019,1,1,0,600000”,由于LIST 2中没有表示“100100,4000019”的行,我希望删除该行。真正的清单是几万行长。我做了一个真正非常长的正则表达式命令,应该使用搜索和替换为我排序,但后来我发现有2048个字符限制,而且我有兴趣找到一个更好的方法来做到这一点。

1 个答案:

答案 0 :(得分:0)

我不知道使用notepad ++但使用gawk for windows你可以这样做:

gawk -F"," 'NR==FNR{l[$0]++; next} {if ($1","$2 in l) print $0 }' file2 file1

第一个块在file2中创建条目列表l,第一个文件在参数中给出( N umber R ecord == F ile N umber R ecord)并跳到下一条记录。

一旦处理了file2,就会为每一行执行第二个块,因为我们使用,作为字段分隔符,我们将{2}首先搜索为l中的键,并仅在它们& #39;重新出现在列表中。

awk中的列表是引擎盖下的c哈希值,因此即使对于file2中的大量行,RAM也不应该成为问题。