如何根据数字的第一个特定数字和其余的数字切割字符串
这是我的数据
Install-Package PushSharp -Version 4.0.10
这是预期的输出
Id actual_pattern
1 100101
2 10101
3 1010101
4 101
的是cut_pattern1
的前4位数字
因为actual_pattern
是cut_pattern2
的余数形式,如果cut_pattern1
的余数不存在,则使cut_pattern1
= 0
如果cut_pattern2
中有任何1
,则使cut_pattern2
= 1,否则使binary_cut2
= 0
binary_cut2
答案 0 :(得分:5)
通过使用str
,replace
进行索引来创建新列,以更改空字符串,对于新列,请使用Series.str.contains
并将其转换为整数:
df['actual_pattern'] = df['actual_pattern'].astype(str)
df['cut_pattern1'] = df['actual_pattern'].str[:4]
df['cut_pattern2'] = df['actual_pattern'].str[4:].replace('','0')
df['binary_cut2'] = df['cut_pattern2'].str.contains('1').astype(int)
print (df)
Id actual_pattern cut_pattern1 cut_pattern2 binary_cut2
0 1 100101 1001 01 1
1 2 10101 1010 1 1
2 3 1010101 1010 101 1
3 4 101 101 0 0
编辑:
@Rick Hitchcock的解决方案:
df['actual_pattern'] = df['actual_pattern'].astype(str)
df['cut_pattern1'] = df['actual_pattern'].str[:4]
df['cut_pattern2'] = df['actual_pattern'].str[4:].replace('','0')
df['binary_cut2'] = df['cut_pattern2'].str.contains('1').astype(int)
print (df)
Id actual_pattern cut_pattern1 cut_pattern2 binary_cut2
0 1 100101 1001 01 1
1 2 10101 1010 1 1
2 3 1010101 1010 101 1
3 4 00001111 0000 1111 1
答案 1 :(得分:3)
这是我的处理方法:
s = df.actual_pattern.astype(str).str
# Split into 2 lists, the first containing the first 4 digits
out = s.split(r'(\d{4})').str[-2:].values.tolist()
# [['1001', '01'], ['1010', '1'], ['1010', '101'], ['101']]
# build a dataframe from the lists
out = pd.DataFrame(out, columns=['cut_pattern1', 'cut_pattern2'])
# fill missing values (absense of string in list) with 0
out['cut_pattern2'] = out.cut_pattern2.fillna('0')
out['binary_cut2'] = out.cut_pattern2.str.contains('1').view('i1')
print(out)
cut_pattern1 cut_pattern2 binary_cut2
0 1001 01 1
1 1010 1 1
2 1010 101 1
3 101 0 0
答案 2 :(得分:2)
在此处使用一些正则表达式和字符串提取:
m=df.actual_pattern.str.extract('(?P<cut_pattern1>.{,4})(?P<cut_pattern2>.*)').replace('',0)
cut_pattern1 cut_pattern2
0 1001 01
1 1010 1
2 1010 101
3 101 0
然后做:
m.assign(binary_cut2=m.cut_pattern2.str.contains('1',na=False).astype(int))
cut_pattern1 cut_pattern2 binary_cut2
0 1001 01 1
1 1010 1 1
2 1010 101 1
3 101 0 0
最后将其连接到原始df:
m=df.actual_pattern.str.extract('(?P<cut_pattern1>.{,4})(?P<cut_pattern2>.*)').replace('',0)
m=m.assign(binary_cut2=m.cut_pattern2.str.contains('1',na=False).astype(int))
pd.concat([df,m],axis=1)
Id actual_pattern cut_pattern1 cut_pattern2 binary_cut2
0 1 100101 1001 01 1
1 2 10101 1010 1 1
2 3 1010101 1010 101 1
3 4 101 101 0 0