我决定使用我编写的脚本和我在这里找到的脚本并将脚本合并在一起,以便朋友不必运行两个脚本。因为他是bash的新手,所以它是一种令人困惑的文件类型和格式,所以我试图让他的生活更轻松。
基本上,当我们刮取帐户时,我们会在列表的顶部和底部获得一些带有一些代码的ID列表,所以我有这个脚本来删除那段代码:
#!/bin/bash
FILE=$1
cat $FILE | sed '1,7d' | sed -n -e :a -e '1,9!{P;N;D;};N;ba' > edited/$FILE
和我使用的第二个脚本我最终编辑,以便该链接到链接后(他想要链接到帐户而不是他们跟随的人)
#!/bin/bash
FILE=$1
endname=$2
while read line; do
echo "$line | User Follows/followed $handle"
done < $FILE > Appended/$endname.txt
我编辑了这个文件,以便循环播放,以便他可以同时对多个文件执行此操作:
#!/bin/bash
while true; do
echo what file do you wish to remove info lines from
read -e FILE
echo where do you want the new file to be located
read -e endlocation
echo what would you like the new file to be named
read filename
cat $FILE | sed '1,7d' | sed -n -e :a -e '1,9!{P;N;D;};N;ba'
while read line; do
echo "https://twitter.com/intent/user?user_id=$line"
done < $FILE | uniq > $endlocation/$filename
rm $FILE
done
问题是两个脚本完全分开工作,但是一旦我将它们组合起来,一个或另一个将无法工作。现在sed不起作用,所以我有想法,我将输出sed,然后将FILE变量重新分配给sed输出的文件,但即使这样,sed还没有完成它的工作。有什么明显的东西我完全错过了吗?我和朋友看了几个小时这个脚本,每个人都试图弄清楚发生了什么,或者没有发生是更好的方式。 一如既往,谢谢你的帮助!
修改: 这是一个示例输入文件(您可以在www.dd-css.com上让自己的Twitter粉丝抓住):
{
"username": "testy",
"created_time": {
"$date": 1461085587225
},
"data": {
"followers": [
721008887751688192,
281667578,
702184946996224000,
3217284865,
722068840314634240,
2885989935,
718119030083698690,
4848801485,
714443675665887232,
4880594986,
4166478021,
722420986369466368,
3232181141,
722079476553752576,
722417819405553666,
3363234395,
722111118781468673,
3150091062,
719798662625419264,
388415906,
722038039023849473,
720509286149971968,
720535522347773953,
709060581224009728,
722133050629480448,
721984368072388608,
720066765829644288,
722377228382773248,
4874218565,
4900522317,
721954174116708352,
712480939427946496,
388526427,
712931529924677632,
721964884267651073
]
},
"qname": "Twitter Friends & Followers",
"parameters": {
"friends_limit": 0,
"screen_name": "Aminov274",
"followers_limit": 1000
}
}
这是我的朋友能够使用的输出文件需要的样子:
https://twitter.com/intent/user?user_id=721008887751688192
https://twitter.com/intent/user?user_id=281667578
https://twitter.com/intent/user?user_id=702184946996224000
https://twitter.com/intent/user?user_id=3217284865
https://twitter.com/intent/user?user_id=722068840314634240
https://twitter.com/intent/user?user_id=2885989935
https://twitter.com/intent/user?user_id=718119030083698690
https://twitter.com/intent/user?user_id=4848801485
https://twitter.com/intent/user?user_id=714443675665887232
https://twitter.com/intent/user?user_id=4880594986
https://twitter.com/intent/user?user_id=4166478021
https://twitter.com/intent/user?user_id=722420986369466368
https://twitter.com/intent/user?user_id=323218114
https://twitter.com/intent/user?user_id=722079476553752576
https://twitter.com/intent/user?user_id=722417819405553666
https://twitter.com/intent/user?user_id=3363234395
https://twitter.com/intent/user?user_id=722111118781468673
https://twitter.com/intent/user?user_id=3150091062
https://twitter.com/intent/user?user_id=719798662625419264
https://twitter.com/intent/user?user_id=388415906
https://twitter.com/intent/user?user_id=722038039023849473
https://twitter.com/intent/user?user_id=720509286149971968
https://twitter.com/intent/user?user_id=720535522347773953
https://twitter.com/intent/user?user_id=709060581224009728
https://twitter.com/intent/user?user_id=722133050629480448
https://twitter.com/intent/user?user_id=721984368072388608
https://twitter.com/intent/user?user_id=720066765829644288
https://twitter.com/intent/user?user_id=722377228382773248
https://twitter.com/intent/user?user_id=4874218565
https://twitter.com/intent/user?user_id=4900522317
https://twitter.com/intent/user?user_id=721954174116708352
https://twitter.com/intent/user?user_id=712480939427946496
https://twitter.com/intent/user?user_id=388526427
https://twitter.com/intent/user?user_id=712931529924677632
https://twitter.com/intent/user?user_id=721964884267651073
答案 0 :(得分:1)
我知道它不是你想要听到的,但你现在所拥有的是非常低效和脆弱的,应该被替换,因为虽然它可以做得很健壮这样做会使代码变得非常复杂并且它会使代码变得非常复杂。我仍然会非常慢。有些背景,请参阅why-is-using-a-shell-loop-to-process-text-considered-bad-practice,但实际上只涵盖了部分故事 - 根据输入文件内容,环境设置甚至内容的各种组合,您的脚本还有其他方式可能会失败。你执行它的目录。
你需要替换它:
cat $FILE | sed '1,7d' | sed -n -e :a -e '1,9!{P;N;D;};N;ba'
while read line; do
echo "https://twitter.com/intent/user?user_id=$line"
done < $FILE | uniq > $endlocation/$filename
rm $FILE
用这个:
awk 'script' "$FILE" > "$endlocation/$filename" &&
rm "$FILE"
其中script
将是一个简洁,强大,高效(数量级更快)的脚本来执行你的seds +循环当前所做的任何事情。如果您编辑问题以包含简明,可测试的样本输入和预期输出,我们可以帮助您编写script
。
鉴于您发布的样本输入/输出,您看起来就像这样:
awk '
BEGIN { FS="[,[:space:]]+" }
/\]/ { inBlock=0 }
inBlock { print "https://twitter.com/intent/user?user_id=" $2 }
/"followers"/ { inBlock=1 }
' file
https://twitter.com/intent/user?user_id=721008887751688192
https://twitter.com/intent/user?user_id=281667578
https://twitter.com/intent/user?user_id=702184946996224000
https://twitter.com/intent/user?user_id=3217284865
https://twitter.com/intent/user?user_id=722068840314634240
https://twitter.com/intent/user?user_id=2885989935
https://twitter.com/intent/user?user_id=718119030083698690
https://twitter.com/intent/user?user_id=4848801485
https://twitter.com/intent/user?user_id=714443675665887232
https://twitter.com/intent/user?user_id=4880594986
https://twitter.com/intent/user?user_id=4166478021
https://twitter.com/intent/user?user_id=722420986369466368
https://twitter.com/intent/user?user_id=3232181141
https://twitter.com/intent/user?user_id=722079476553752576
https://twitter.com/intent/user?user_id=722417819405553666
https://twitter.com/intent/user?user_id=3363234395
https://twitter.com/intent/user?user_id=722111118781468673
https://twitter.com/intent/user?user_id=3150091062
https://twitter.com/intent/user?user_id=719798662625419264
https://twitter.com/intent/user?user_id=388415906
https://twitter.com/intent/user?user_id=722038039023849473
https://twitter.com/intent/user?user_id=720509286149971968
https://twitter.com/intent/user?user_id=720535522347773953
https://twitter.com/intent/user?user_id=709060581224009728
https://twitter.com/intent/user?user_id=722133050629480448
https://twitter.com/intent/user?user_id=721984368072388608
https://twitter.com/intent/user?user_id=720066765829644288
https://twitter.com/intent/user?user_id=722377228382773248
https://twitter.com/intent/user?user_id=4874218565
https://twitter.com/intent/user?user_id=4900522317
https://twitter.com/intent/user?user_id=721954174116708352
https://twitter.com/intent/user?user_id=712480939427946496
https://twitter.com/intent/user?user_id=388526427
https://twitter.com/intent/user?user_id=712931529924677632
https://twitter.com/intent/user?user_id=721964884267651073
以下是您的shell脚本应该是什么样子:
#!/bin/bash
while true; do
echo what file do you wish to remove info lines from
read -e FILE
echo where do you want the new file to be located
read -e endlocation
echo what would you like the new file to be named
read filename
awk '
BEGIN { FS="[,[:space:]]+" }
/\]/ { inBlock=0 }
inBlock && !seen[$2]++ { print "https://twitter.com/intent/user?user_id=" $2 }
/"followers"/ { inBlock=1 }
' "$FILE" > "$endlocation/$filename" &&
rm "$FILE"
done
我添加了!seen[$2]++
,因为我刚刚注意到你原来的uniq
管道(由于你的输入没有被排除但是我在awk脚本中有的东西会起作用,因此不会有btw)
答案 1 :(得分:0)
您已通过从相同的文件中读取sed
和cat
来绕过2 while read
(不输出任何中间文件) )。这样做:
cat $FILE |
sed '1,7d' |
sed -n -e :a -e '1,9!{P;N;D;};N;ba' |
while read line; do
echo "https://twitter.com/intent/user?user_id=$line"
done |
uniq > $endlocation/$filename