grep来自只有一行的文件的名称

时间:2015-02-28 08:54:53

标签: arrays bash grep

我有一个长行的文件。我只需要保存@names,但是用grep我不能得到它。例如,在下面的文本中,我需要在数组中保存名称mrvortex和MurphTWN

\"\/\u003e\n    @//    \u003cstrong class=\"fullname js-action-profile-name\"\u003eMartin Belanger\u003c\/strong\u003e\n   @234/       \u003cspan class=\"username js-action-profile-name\"\u003e@mrvortex\u003c\/span\u003e\n          \n      \u003c\/a\u003e\n    \u003c\/div\u003e\n      \u003cp class=\"bio \"\u003e\n          Meteorologist and Sr. Manager, TV & Cross-Platform Technologies at Pelmorex Media\n      \u003c\/p\u003e\n\n    \n\n\n  \u003c\/div\u003e\n\u003c\/div\u003e\n\n\n\u003c\/li\u003e\n\n  \u003cli class=\"js-stream-item stream-item stream-item\n\" data-item-id=\"151623861\" id=\"stream-item-user-151623861\" data-item-type=\"user\"\u003e\n    \n\u003cdiv class=\"account  js-actionable-user js-profile-popup-actionable \" data-screen-name=\"MurphTWN\" data-user-id=\"151623861\" data-feedback-token=\"\" data-impression-id=\"\" \u003e\n    \n\n     \u003cdiv class=\"user-actions btn-group not-following  \" data-user-id=\"151623861\"\n    data-screen-name=\"MurphTWN\" data-name=\"Chris Murphy TWN\" data-protected=\"false\"\u003e\n\n\n\n    \n\n\n  \u003cbutton class=\"user-actions-follow-button js-follow-btn follow-button btn\" type=\"button\"\u003e\n  \u003cspan class=\"button-text follow-text\"\u003e\n     \u003cspan class=\"Icon Icon--follow\"\u003e\u003c\/span\u003e Seguir \n    \n  \u003c\/span\u003e\n  \u003cspan class=\"button-text following-text\"\u003e\n     Siguiendo\n    \n  \u003c\/span\u003e\n  \u003cspan class=\"button-text unfollow-text\"\u003e\n     Dejar de seguir\n    \n  \u003c\/span\u003e\n  \u003cspan class=\"button-text blocked-text\"\u003eBloqueado\u003c\/span\u003e\n  \u003cspan class=\"button-text unblock-text\"\u003eDesbloquear\u003c\/span\u003e\n  \u003cspan class=\"button-text pending-text\"\u003ePendiente\u003c\/span\u003e\n  \u003cspan class=\"button-text cancel-text\"\u003eCancelar\u003c\/span\u003e\n\u003c\/button\u003e\n\n\n\n\u003c\/div\u003e\n\n\n\n  \u003cdiv class=\"content\"\u003e\n    \u003cdiv class=\"stream-item-header\"\u003e\n      \u003ca class=\"account-group js-user-profile-link\" href=\"\/MurphTWN\" \u003e\n        \u003cimg class=\"avatar js-action-profile-avatar \" src=\"https:\/\/pbs.twimg.com\/profile_images\/512972504411828224\/sM3noxz7_normal.jpeg\" alt=\"\" data-user-id=\"151623861\"\/\u003e\n        \u003cstrong class=\"fullname js-action-profile-name\"\u003eChris Murphy TWN\u003c\/strong\u003e\u003cspan class=\"Icon Icon--verified Icon--small\"\u003e\u003cspan class=\"u-hiddenVisually\"\u003eCuenta verificada\u003c\/span\u003e\u003c\/span\u003e\n\n          \u003cspan class=\"username js-action-profile-name\"\u003e@MurphTWN\u003c\/span\u003e\n          \n      \u003c\/a\u003e\n    \u003c\/div\u003e\n      \u003cp class=\"bio \"\u003e\n          

2 个答案:

答案 0 :(得分:1)

这可能会:

awk -v RS="@" 'NR>1{$1=$1;n=split($1,a,"[^a-zA-Z]");if (a[1]) print a[1]}' file
mrvortex
MurphTWN

或者gnu awk(点击RS="u003e@",gnu支持RS中的多个字符):

awk -v RS="u003e@" 'NR>1{$1=$1;split($1,a,"[^a-zA-Z]");print a[1]}'file
mrvortex
MurphTWN

答案 1 :(得分:0)

并不是100%清楚你需要什么,但我相信你想解析那条长行并捕获@符号后面的名字(实际名称,而不是234等。 )。您可以使用grep -o以及BASH中的模式匹配来完成此任务。请注意,这不是最优雅的解决方案,在脚本中,我从文件中读取长行。 (您可以稍微修改cat这一行。如果您有疑问,请告诉我们:

#!/bin/bash

## get the long line from a file
ifn=${1:-dat/longline.txt}
[[ -f $ifn && -r $ifn ]] || { echo "Error: file not found '$ifn'"; exit 1; }

declare -a names

## grep the long line, remove leading '@', store in array
for i in $(grep -o '@[[:alpha:]]\+' -- "$ifn"); do 
    names+=( "${i#@}" )
done

## print array contents
for i in "${names[@]}"; do 
    echo "names: $i"
done

<强>输出:

$ bash atnames.sh
names: mrvortex
names: MurphTWN