我的数据框有值:
data_df
0 student
1 sample text
2 student
3 no students
4 sample texting
5 random sample
我使用正则表达式来提取带有单词' student'我的结果如下:
regexdf
0 student
2 student
我的目标是在主数据框架中创建一个包含0和1值的新列。即第0行应为1,第5行应为零。(因为' regexdf'学生'在第0行和第2行)我如何匹配两者中的索引并创建一个列?
答案 0 :(得分:2)
使用正则表达式:
data_df = data_df.assign(regexdf = data_df[1].str.extract(r'(student)\b', expand=False))
data_df['student'] = data_df['regexdf'].notnull().mul(1)
print(data_df)
输出:
1 regexdf student
0 student student 1
1 sample text NaN 0
2 student student 1
3 no students NaN 0
4 sample texting NaN 0
5 random sample NaN 0
df_out = data_df.join(regexdf, rsuffix='regex')
df_out['pattern'] = df_out['1regex'].notnull().mul(1)
df_out['Count_Pattern'] = df_out['pattern'].cumsum()
print(df_out)
输出:
1 1regex pattern Count_Pattern
0 student student 1 1
1 sample text NaN 0 1
2 student student 1 2
3 no students NaN 0 2
4 sample texting NaN 0 2
5 random sample NaN 0 2
答案 1 :(得分:0)
您也可以
df['bool'] = df[1].eq('student').astype(int)
或
df['bool'] = df[1].str.match(r'(student)\b').astype(int)
1 bool
0 student 1
1 sample text 0
2 student 1
3 no students 0
4 sample texting 0
5 random sample 0
如果你想要一个新的数据帧,那么
ndf = df[df[1].eq('student')].copy()