我在目录.bam
中有一个特定的集合(所有以/home/cmccabe/Desktop/NGS/API/2-15-2016
结尾)下载的文件。我要做的是使用$2
中的name
匹配来重命名下载的文件。为了使事情更具参与性,文件夹的日期是唯一的,并且在name
的标题中存在匹配日期,并且是name
中的匹配所在的位置。我不知道该怎么做或者有可能。谢谢你:)。
文件夹/home/cmccabe/Desktop/NGS/API/2-15-2016
的内容
IonXpress_001.bam
IonXpress_002.bam
IonXpress_003.bam
IonXpress_007.bam
file1.gz
file2.gz
命名
2-15-2016
IonXpress_001.bam testname1_12345
IonXpress_002.bam testname2_45678
IonXpress_003.bam testname3_9012
IonXpress_007.bam testname1_12345-
2-19-2016
IonXpress_001.bam testname5_00000
IonXpress_002.bam testname6_11111
IonXpress_003.bam testname7_1213
IonXpress_007.bam testname8_78524
期望的结果
testname1_12345.bam
testname2_45678.bam
testname3_9012.bam
testname1_12345.bam
file1.gz
file2.gz
到目前为止bash
logfile=/home/cmccabe/Desktop/NGS/API/2-15-2016/process.log
for f in /home/cmccabe/Desktop/NGS/API/2-15-2016/*.bam ; do
echo "patient identifier creation: $(date) - File: $f"
bname=$(basename $f)
pref=${bname%%.bam}
while read from to ; do
for i in $f* ; do
if [ "$i" != "${i/$from/$to}" ] ; then
mv $i ${i/$from/$to}
fi
done < names.txt
echo "End patient identifier creation: $(date) - File: $f"
done >> "$logfile"
编辑:
for f in /home/cmccabe/Desktop/NGS/API/2-12-2016/*.bam ; do
bname=$(basename $f)
cmd=$(sed -n "/$f/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{s/\(.*\.bam\) \(.*\)/mv \1 \2/p}" /home/cmccabe/Desktop/NGS/panels/names.txt)
echo "$cmd"
done
sed: -e expression #1, char 4: extra characters after command
答案 0 :(得分:2)
您可以将此for
循环与awk
:
cd /home/cmccabe/Desktop/NGS/
for file in API/*/*.bam; do
f="${file##*/}"
path="${file%/*}"
dt="${path##*/}"
mv "$file" "$path/$(awk -v dt="$dt" -v f="$f" 'NF==1 {
p=$0==dt ? 1 : 0; next} p && $1==f{print $2}' names.txt)"
done
答案 1 :(得分:1)
你可以做类似这样的事情我在sed中使用你的f变量:
cmd=$(sed -n "/$f/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{s/\(.*\.bam\) \(.*\)/mv \1 \2/p}" names.txt)
# for testing use echo and this will also save what you just tried
#to do to your log file :) just in case.
echo "$cmd"
# when it works the way you want
# uncomment the next line and it will execute your command :)
#eval "$cmd"
这样做是告诉sed不要用-n
打印它读取的行然后从匹配日期($ f)的行到下一个数据模式DD-DD-20DD(正则表达式:[0-9] {1,2} - [0-9] {1 ,2} -20 [0-9] {2})在{}
之间执行命令{}内的命令是替代品&#34; s&#34;与模式匹配的命令,并将其替换为另一个模式。
我告诉它将字符串一直带到.bam并将其放在\(和\)之间,然后匹配其余部分并将其放在另一个组中
替换模式是mv字符串,后跟匹配模式中捕获的组1,然后是组2中的字符串。有效地创建mv file.bam new_filename命令列表。
然后将它们存储在cmd变量
中eval将执行命令..
我带了你的name.txt文件的示例内容并进行了转换来说明:
~$echo "2-12-2016
IonXpress_001.bam testname1_12345
IonXpress_002.bam testname2_45678
IonXpress_003.bam testname3_9012
IonXpress_007.bam testname1_12345-
2-19-2016
IonXpress_001.bam testname5_00000
IonXpress_002.bam testname6_11111
IonXpress_003.bam testname7_1213
IonXpress_007.bam testname8_78524" |sed -n "/$f/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{s/\(.*\.bam\) \(.*\)/mv \1 \2/p}"
mv IonXpress_001.bam testname1_12345
mv IonXpress_002.bam testname2_45678
mv IonXpress_003.bam testname3_9012
mv IonXpress_007.bam testname1_12345-
mv IonXpress_001.bam testname5_00000
mv IonXpress_002.bam testname6_11111
mv IonXpress_003.bam testname7_1213
mv IonXpress_007.bam testname8_78524
更新:从您的评论和编辑中我发现我不太擅长解释:)我这里是您脚本的编辑版本。 我将假设您在运行此文件时位于/ home / cmccabe / Desktop / NGS / API /文件夹中。如果不是,我相信你会知道如何做出改变或让它参与进来。
logfile=/home/cmccabe/Desktop/NGS/API/2-15-2016/process.log
# no need to loop for each file ending in bam as the name file
# will be our driver. After all if the entry is not present in
# the name file then we really cannot do anything.
# First lets get the date from the folder name:
# pwd will return the current working directory (which we are supposed
# to be in the directory to process)
# basename will strip all but the last folder name, hence the date
date_to_process=$(basename $(pwd))
# variable to store name file path (hint change this to where it really is or pass as argument to script)
name_file_path = "/home/cmccabe/Desktop/NGS/panels/names.txt"
# from the name file build the file move (mv) commmands using sed
# as described before and store that command in the cmd variable.
# note that I added a couple of echo commands to have the same output you
# were trying to do. I also split the command on multiple lines
# for clarity (well I hope it makes it more clear at least).
cmd=$(sed -n "/$date_to_process/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{
s/\(.*\.bam\) \(.*\)/echo \"Start patient identifier creation: \$(date) - File: \1\"\n mv \1 \2\n echo \"End patient identifier creation: \$(date) - File: \1\"/p
}" $name_file_path)
# print the generated commands to you can see what it did.
echo "about to execute this command:
$cmd"
# execute the commands to perform the move operations and send the
#output to the log file. Make sure to pipe stderr (errors) to the log file
# too so you will know what/if something failed. (using 2>&1) this will make all stderr go to the same pipe as stdin.
eval "$cmd" >> "$logfile" 2>&1