请帮助,我需要创建一个新列。如果我的列表的“类型”列中包含一个值。
从查找表返回各自的值。 我尝试了很多方法,这些方法对于python还是很陌生的,希望我能早一点结束
dh是下面的数据框
sub-genre first second mid-genre genre
indie indie Alternative rock
dream pop dream pop Alternative rock
shoegaze shoegaze Alternative rock
post-hardcore post hardcore HardcorePunk rock
emo emo HardcorePunk rock
screamo screamo HardcorePunk rock
synthcore synthcore Harcore Punk rock
rock rock Contemporary rock
diy =下方的数据框
artist genres New Column
2:54 ['metropopolis'] No Genre (blank)
22 ['norwegian rock'] Contemporary
27 ['boston rock'] Contemporary
33 [] No Genre (blank)
36 ['ambient', 'compositional ambient', 'drift', ...
44 ['emo', 'pop punk', 'skate punk'] Hardcore Punk
52 []
68 []
83 ['hip hop quebecois'] Hip hop
下面的代码尝试
diy = pd.DataFrame(data[['artist','genres']])
for i in diy['genres'].iteritems():
for x, y, z, t in zip(dh['first'], dh['second'],dh['mid-genre'],dh['genre']):
if h.str.contains(x) and h.str.contains(z):
diy['mid-genre'] = z
diy['Main-genre'] = t
错误消息
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
第二次尝试,我在IF语句中添加了.any()来尝试处理异常:
if h.str.contains(x).any() and h.str.contains(z).any():
UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
答案 0 :(得分:0)
我想出了解决方案,以防其他人有类似的任务。
import re
diy = pd.DataFrame(data[['artist','genres']])
omg = [] # Create container for new values
# Zip lookup table to Loop through strings (Pattern to look for)
for x, y, z, t in zip(lookup['first'], lookup['second'],lookup['mid-genre'],lookup['genre']):
# convert X and Y to Regular expression pattern
p = re.compile(x)
q = re.compile(str(y))
# Loop through Data
for i, k in zip(diy['artist'],diy['genres']):
# Create and store match object (RegEx object)
m = p.search(str(k))
j = q.search(str(k))
# If M and J both match diy['genre']
if (m and j):
woo = (i, z, t) # Return lookup[['mid-genre','Main-genre']]
omg.append(woo) # Append to container
else:
# If no match label No genre
woo = (i,'No genre','No genre')
omg.append(woo)