将逗号从csv文件中的列中取出

时间:2015-10-27 04:10:07

标签: bash perl sed

我需要删除第三列中的所有逗号 - 在(P)之前的任何内容。所以我需要改变

4,653920, Disciplined Growth Investors, (P), MN Continental Europe , Netherlands ,0.0,6.0,3.247039123535156

进入

4,653920, Disciplined Growth Investors (P), MN Continental Europe , Netherlands ,0.0,6.0,3.247039123535156

3,670862, Barrow, Hanley, Mewhinney & Strauss (P)  , Continental Europe , Germany ,0.0,117228.58280736001,81988.77229259514

需要变成

  3,670862, Barrow Hanley Mewhinney & Strauss (P)  , Continental Europe , Germany ,0.0,117228.58280736001,81988.77229259514
4,646689, Turner Investment Partners Inc (P), USA , Continental Europe Sweden ,0.0,32.31363986312867,10.986624382831804
4,653920, Disciplined Growth Investors, (P), MN Continental Europe , Netherlands ,0.0,6.0,3.247039123535156
3,26372, Delaware Investment Advisors (P), USA , South East Asia India ,0.0,0.0,0.0
3,640531, J. Goldman & Co L.P. (P), New York , Emerging Markets Cyprus ,0.0,133.0,109.06
4,978983, Mirae Asset Mgmt (RP), London , United Kingdom United Kingdom ,0.0,0.0,0.0
3,11689, Panagora Asset Management (P), USA , Emerging Markets Greece ,0.0,104.41579594,76.1271739939902
4,49077, Hellman, Jordan Management Company Inc (P), Boston South East Asia , Asia - Multi Mkt ,0.0,0.0,0.0
4,9133838, AmericaFirst Capital Management LLC (P), USA , United States of America United States of America ,0.0,14999.789999999999,12030.62399999999
4,654134, Bessemer Trust Company (RP), New Jersey , South East Asia India ,0.0,0.6000000000000001,0.5733759994506836
3,674681, Amici Capital LLC (P), USA , South East Asia Asia - Multi Mkt ,0.0,0.0,0.0
4,49077, Hellman, Jordan Management Company Inc (P), Boston Australia & NZ , Australia ,0.0,0.0,0.0
4,45722, Par Capital Management (P), USA , Japan Japan. ,0.0,0.0,0.0
3,926297, AGF Management Ltd (RP), Canada , North America Canada. ,0.0,0.0,0.0
3,49077, Hellman, Jordan Management Company Inc (P), Boston South East Asia , Singapore ,0.0,1.26,0.8043503979492187
3,926297, AGF Management Ltd (RP), Canada , Continental Europe Norway ,0.0,0.0,0.0
3,9057635, Pine River Capital (P), Minneapolis , Continental Europe Europe - Multi Mkt ,0.0,0.0,0.0
4,2000015, Alpine Woods Capital Investors, LLC (P), USA United Kingdom , United Kingdom ,0.0,987.7121818200001,935.0341807272648
4,1132877, Echinus Partners LP (P), United States , United States of America United States of America ,0.0,0.0,0.0
4,669920, Schafer Capital Mgmt Inc (P), New York , South East Asia Indonesia ,0.0,9.600000000000001,9.36
3,40238, Davis Selected Advisers, LP (P), Santa Fe South East Asia , China ,0.0,1263.15,1221.045
4,1067377, Columbia Wanger Asset Management (P), USA , United States of America United States of America ,0.0,5403.889999999999,5205.6125
3,823184, Delta Partners (P), USA , Latin America Latin America - Multi Mkt ,0.0,0.0,0.0
3,508152, Federated Investors Inc (P), USA , Other South Africa ,0.0,0.0,0.0
3,670862, Barrow, Hanley, Mewhinney & Strauss (P)  , Continental Europe , Germany ,0.0,117228.58280736001,81988.77229259514
4,1116378, Salzman Co Inc (P), USA , United Kingdom United Kingdom ,0.0,92.9,79.26387728576661
4,647619, Segall Bryant & Hamill (P), Minneapolis , Latin America Colombia ,0.0,1.67,1.5699999999999998
3,653920, Disciplined Growth Investors, (P), MN United Kingdom , United Kingdom ,0.0,4.0,3.82
4,989767, Coatue Management LLC (P), USA , Continental Europe Austria ,0.0,1326.0216336784424,1255.3005343314537
3,34455, Gabelli Asset Management Inc (P), USA , Continental Europe France ,0.0,885.54552259,814.7023230917504
3,832792, Clovis Capital Management, LP (P), New York United States of America , United States of America ,0.0,96850.0,96077.5
4,669920, Schafer Capital Mgmt Inc (P), New York , Continental Europe France ,0.0,198.05,192.48499999999999

2 个答案:

答案 0 :(得分:0)

你应该使用sed -i:

sed -i -e 's/^\(.*\)\(,\)\(\ *(P)\)\(.*\)$/\1\3\4/g' CSVFILE

答案 1 :(得分:0)

这可能适合你(GNU sed):

sed -r ':a;s/^(([^,]*,){2}[^,]*),(.*\(P\))/\1\3/;ta' file

这将删除第二个字段结尾后的任何逗号,直到标记(P)

使用divide and conquer方法的另一种解决方案:

sed -r 's/^(([^,]*,){2})(.*\(P\))/\1\n\3\n/;h;s/,//g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/' file

将线条分成三部分,处理中间部分,然后使用副本重建整条线。