我在CSV的文本字段中有各种各样的值
有些值看起来像这样 AGM00BALDWIN AGM00BOUCK
然而,有些人有重复,将名称更改为 AGM00BOUCK01 AGM00COBDEN01 AGM00COBDEN02
我的目标是将特定ID写入不包含数字后缀的值
这是迄今为止的代码
prov_count = 3000
prov_ID = 0
items = (name, x, y)
xy_tup = tuple(items)
if "*1" not in name and "*2" not in name:
prov_ID = prov_count + 1
else:
prov_ID = ""
看来通配符不是这里适当的方法,但我似乎无法找到合适的解决方案。
答案 0 :(得分:1)
有不同的方法,一个使用isdigit
函数:
a = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]
for i in a:
if i[-1].isdigit(): # can use i[-1] and i[-2] for both numbers
print (i)
regex
:
import re
a = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]
pat = re.compile(r"^.*\d$") # can use "\d\d" instead of "\d" for 2 numbers
for i in a:
if pat.match(i): print (i)
另一:
for i in a:
if name[-1:] in map(str, range(10)): print (i)
以上所有方法都返回带有数字后缀的输入:
AGM00BOUCK01
AGM00COBDEN01
AGM00COBDEN02
答案 1 :(得分:1)
在这里使用正则表达式似乎是合适的:
import re
pattern= re.compile(r'(\d+$)')
prov_count = 3000
prov_ID = 0
items = (name, x, y)
xy_tup = tuple(items)
if pattern.match(name)==False:
prov_ID = prov_count + 1
else:
prov_ID = ""
答案 2 :(得分:0)
您可以使用切片查找元素的最后2个字符,然后检查它是否以'01'
或'02'
结尾:
l = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]
for i in l:
if i[-2:] in ('01', '02'):
print('{} is a duplicate'.format(i))
输出:
AGM00BOUCK01 is a duplicate
AGM00COBDEN01 is a duplicate
AGM00COBDEN02 is a duplicate
或者另一种方法是使用str.endswith
方法:
l = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]
for i in l:
if i.endswith('01') or i.endswith('02'):
print('{} is a duplicate'.format(i))
所以你的代码看起来像这样:
prov_count = 3000
prov_ID = 0
items = (name, x, y)
xy_tup = tuple(items)
if name[-2] in ('01', '02'):
prov_ID = prov_count + 1
else:
prov_ID = ""