我在linux文件夹中有特定格式的文件名。我想在这些文件名的第一个和第二个下划线之间提取字符串,并计算该文件夹中的此类文件类型。文件名如下:
2305237803310_ABC_A05_1378414278883.hl7
20132480014907_DEF_R01_1378420192336.hl7
20132480014793_DEF_R01_1378418604889.hl7
2313642803310_ABC_A08_1378824296915.hl7
2313614403310_ABC_A08_1378823995805.hl7
2313614403310_MNY_A08_1378823995805.hl7
等等
我的脚本输出应该给我:
ABC 3
DEF 2
MNY 1
答案 0 :(得分:3)
使用defaultdict
或Counter
或setdefault
或__missing__
个用户来计算它们。这是__missing__
:
txt='''\
2305237803310_ABC_A05_1378414278883.hl7
20132480014907_DEF_R01_1378420192336.hl7
20132480014793_DEF_R01_1378418604889.hl7
2313642803310_ABC_A08_1378824296915.hl7
2313614403310_ABC_A08_1378823995805.hl7
2313614403310_MNY_A08_1378823995805.hl7'''
class Dicto(dict):
def __missing__(self, key):
self[key]=0
return self[key]
d=Dicto()
for line in txt.splitlines():
k=line.split('_')
d[k[1]]+=1
print d
# {'MNY': 1, 'ABC': 3, 'DEF': 2}
答案 1 :(得分:1)
我会使用正则表达式,os.listdir和dict来跟踪计数。像这样的东西相对紧凑,这种方法可以推广到其他类似的问题。
import re
import os
import collections
def print_names():
names_count = collections.Counter()
regex = r'[^_]+_([^_]*)_.*'
for file_name in os.listdir("."):
match = re.match(regex, file_name)
if match:
names_count[match.groups()[0]] += 1
for name, count in names_count.items():
print(name, count)
if __name__ == "__main__":
print_names()
使用示例文件输出: ABC 3 MNY 1 DEF 2
答案 2 :(得分:1)
使用dict并拆分它可以很容易:
s = ["2305237803310_ABC_A05_1378414278883.hl7","20132480014907_DEF_R01_1378420192336.hl7","20132480014793_DEF_R01_1378418604889.hl7",
"2313642803310_ABC_A08_1378824296915.hl7","2313614403310_ABC_A08_1378823995805.hl7","2313614403310_MNY_A08_1378823995805.hl7"]
resultsDict = {}
for value in s:
m = value.split("_")
if len(m) > 2:
myString = m[1]
if myString in resultsDict:
resultsDict[myString] += 1
else:
resultsDict.update({myString: 1})
else:
print "error in the string! there are less then 2 _"
print resultsDict
<强>输出:强>
{'MNY': 1, 'ABC': 3, 'DEF': 2}
答案 3 :(得分:0)
bash(100%内部命令):
#!/bin/bash
declare -A ARRAY
cd "/your/linux/folder"
for TAG in *
do TAG=${TAG#*_}; TAG=${TAG%%_*}; (( ++ARRAY[$TAG] ))
done
for TAG in ${!ARRAY[*]}
do echo $TAG ${ARRAY[$TAG]}
done
输出:
ABC 3
MNY 1
DEF 2