我正在尝试创建一个函数,该函数将在pandas数据框中创建一个新列,该函数将找出字符串列中的哪个子字符串,并获取该子字符串并将其用于新列。
问题在于要查找的文本没有出现在变量x
的同一位置
df = pd.DataFrame({'x': ["var_m500_0_somevartext","var_m500_0_vartextagain",
"varwithsomeothertext_0_500", "varwithsomext_m150_0_text"], 'x1': [4, 5, 6,8]})
finds = ["m500_0","0_500","m150_0"]
finds
中的哪个在给定的df["x"]
行中
我已经制作了一个可以正常工作的函数,但是对于大型数据集来说却非常慢
def pd_create_substring_var(df,new_var_name = "new_var",substring_list=["1"],var_ori="x"):
import re
df[new_var_name] = "na"
cols = list(df.columns)
for ix in range(len(df)):
for find in substring_list:
for m in re.finditer(find, df.iloc[ix][var_ori]):
df.iat[ix, cols.index(new_var_name)] = df.iloc[ix][var_ori][m.start():m.end()]
return df
df = pd_create_substring_var(df,"t",finds,var_ori="x")
df
x x1 t
0 var_m500_0_somevartext 4 m500_0
1 var_m500_0_vartextagain 5 m500_0
2 varwithsomeothertext_0_500 6 0_500
3 varwithsomext_m150_0_text 8 m150_0
答案 0 :(得分:3)
这能满足您的需求吗?
finds = ["m500_0", "0_500", "m150_0"]
df["t"] = df["x"].str.extract(f"({'|'.join(finds)})")
答案 1 :(得分:1)
可能不是最好的方法:
~$ free -h
total used free shared buff/cache available
Mem: 15G 14G 172M 520K 1.1G 77M
Swap: 15G 644M 15G
现在:
df['t'] = df['x'].apply(lambda x: ''.join([i for i in finds if i in x]))
是:
print(df)
现在,只需添加到@pythonjokeun的答案,您就可以做到:
x x1 t
0 var_m500_0_somevartext 4 m500_0
1 var_m500_0_vartextagain 5 m500_0
2 varwithsomeothertext_0_500 6 0_500
3 varwithsomext_m150_0_text 8 m150_0
或者:
df["t"] = df["x"].str.extract("(%s)" % '|'.join(finds))
或者:
df["t"] = df["x"].str.extract("({})".format('|'.join(finds)))
答案 2 :(得分:1)
我不知道您的数据集有多大,但是您可以使用以下地图功能:
def subset_df_test():
df = pandas.DataFrame({'x': ["var_m500_0_somevartext", "var_m500_0_vartextagain",
"varwithsomeothertext_0_500", "varwithsomext_m150_0_text"], 'x1': [4, 5, 6, 8]})
finds = ["m500_0", "0_500", "m150_0"]
df['t'] = df['x'].map(lambda x: compare(x, finds))
print df
def compare(x, finds):
for f in finds:
if f in x:
return f
答案 3 :(得分:1)
df['x'].str.findall("|".join(finds))
0 [m500_0]
1 [m500_0]
2 [0_500]
3 [m150_0]
答案 4 :(得分:0)
尝试一下
**/Testng/target/testng-results.xml