我根据其他列是否包含某些字符串值来创建新的布尔列。
这就是我试过的
def function(data):
data["col1"] = 0
data["col2"] = 0
data["col3"] = 0
for i in range(0,len(data)):
if ("cond1" in data.loc[i,"old_col1"].lower()) or ("cond2" in data.loc[i,"old_col1"].lower()):
data.loc[i,"col1"] = 1
elif ("cond3 " in data.loc[i,"old_col1"].lower()) or ("cond4 " in data.loc[i,"old_col2"].lower()):
data.loc[i,"col2"] = 1
elif ("cond5 " in in data.loc[i,"old_col1"].lower()) or ("cond6 " in data.loc[i,"old_col3"].lower()):
data.loc[i, "col3"] = 1
function(data)
但它不能很好地扩展到更大的数据集。
是否有更好的方法来实现更快的布尔列col1-3?
答案 0 :(得分:1)
我做了一个示例数据框,因为你没有提供一个
col1 col2 col3
0 foo cucumber HogsWatch
1 bar selery hogswatch
2 baz Porcupine Watch Hogs
您可以使用apply来使函数在整个数据框上工作
df.apply(lambda x: x.str.contains('A', flags=re.IGNORECASE))
col1 col2 col3
0 False False True
1 True False True
2 True False True
这意味着您可以使用布尔列生成新数据框,如果需要,可以将其加入原始数据框
bool_df = df.apply(lambda x: x.str.contains('A', flags=re.IGNORECASE))
df = df.merge(bool_df, left_index=True, right_index=True, suffixes=['', '_bool'])
col1 col2 col3 col1_bool col2_bool col3_bool
0 foo cucumber HogsWatch False False True
1 bar selery hogswatch True False True
2 baz Porcupine Watch Hogs True False True
当然,您可以在str.contains
中制作更复杂的正则表达式,例如
df.apply(lambda x: x.str.contains('A|O', flags=re.IGNORECASE))
col1 col2 col3
0 True False True
1 True False True
2 True True True
我注意到每个列都有不同的条件,这种方法也可以实现,但它有点复杂,但仍然很快。
首先,我们创建所有实际匹配字符串的数据框
conditions = {"col1": ["ar", "f"], "col2": ["er", "c"], "col3": ["Hog", " "]}
for col_name, strings in conditions:
regex = "(" + ")|(".join(strings) + ")"
df_cond = df[col_name].str.extract(regex, flags=re.IGNORECASE).notnull()
df[col_name + '_matches'] = df_cond.T.max().T
可生产
col1 col2 col3 col1_matches col2_matches col3_matches
0 foo cucumber HogsWatch True True True
1 bar selery hogswatch True True True
2 baz Porcupine Watch Hogs True False True
答案 1 :(得分:0)
这可以简化为
<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
xmlns:at="http://my.sample.com/bpmn"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.omg.org/spec/BPMN/20100524/MODEL
try.xsd">
<userTask id="123xb" name="task1">
<extensionElements>
<at:AutomaticTask name="myTask" id="0318ba00" />
</extensionElements>
<!-- incoming and outgoing are not defined in your XSD:
<incoming>SequenceFlow_1x3hpv4</incoming>
<outgoing>SequenceFlow_02ko1r6</outgoing>
-->
</userTask>
</definitions>
或
data['col1'] = data['old_col1'].apply(lower)=='cond1' | data['old_col1'].apply(lower)=='cond2'
答案 2 :(得分:0)
你应该这样做:
data["col1"] = 0
data["col2"] = 0
data["col3"] = 0
data.loc[data["old_col1"].str.lower().isin(["cond1", "cond2"]), 'col1'] = 1
data.loc[data["old_col2"].str.lower().isin(["cond3", "cond4"]), 'col2'] = 1
data.loc[data["old_col3"].str.lower().isin(["cond5", "cond6"]), 'col3'] = 1