我需要将一个dataframe列拆分为多个列,以确保每个单元格中仅包含两个值。当前数据框如下:
Name | Number | Code |
..............................
Tom | 78797071| 0
Nick | | 89797071
Juli | | 57797074
June | 39797571| 0
Junw | | 23000000|
如果代码包含8位数字,则每列中每两位数字分开,并且如果 DIV 中有 00 ,则应将其标记为“不完整” < / p>
新数据框应如下所示:
Name | Number | Code | DIV|DIV2|DIV3|DIV4|Incomplete |
........................................................................
Tom | 78797071| 0 | 0 | 0| 0 | 0 |incomplete |
Nick | | 89797071| 89| 79 | 70 | 71 |complete |
Juli | | 57797074| 57| 79 | 70 | 74 |complete |
June | 39797571| 0 | 0| 0| 0 | 0 |complete |
Junw | | 23000000| 23| 00| 00 | 00 |incomplete |
答案 0 :(得分:1)
尝试此快速修复。
import pandas as pd
import re
#data-preprocessing
data = {'Name': ['Tom','Nick','Juli','June','Junw'],'Code': ['0', '89797071', '57797074', '0', '23000000']}
#I omitted Number key in data
df = pd.DataFrame(data)
print(df)
#find patterns
pattern = r'(\d{2})(\d{2})(\d{2})(\d{2})'
zero_pattern = r'0{1,}'
split_data = []
for _ in df['Code'].items():
to_find = _[1]
splitted = re.findall(pattern, to_find)
if splitted:
temp = list(splitted[0])
if '00' in temp:
temp.append('incomplete')
else:
temp.append('complete')
split_data.append(temp)
zeromatch = re.match(zero_pattern, to_find)
if zeromatch:
split_data.append(['0','0','0','0','incomplete'])
#make right dataframe
col_name = ['DIV1','DIV2','DIV3','DIV4','Incomplete']
df2 = pd.DataFrame(split_data, columns=col_name)
df[col_name]= df2
print(df)
输出
Name Code
0 Tom 0
1 Nick 89797071
2 Juli 57797074
3 June 0
4 Junw 23000000
Name Code DIV1 DIV2 DIV3 DIV4 Incomplete
0 Tom 0 0 0 0 0 incomplete
1 Nick 89797071 89 79 70 71 complete
2 Juli 57797074 57 79 70 74 complete
3 June 0 0 0 0 0 incomplete
4 Junw 23000000 23 00 00 00 incomplete
答案 1 :(得分:1)
您可以使用字符串函数zfill和findall如下所示
df.Code = df.Code.astype(np.str)
## zfill will pad string with 0 to make its lenght 8, findall will find each pair of digit
## explode will split list into rows (explode works with pandas 0.25 and above)
## reshape to make it 4 columns
arr = df.Code.str.zfill(8).str.findall(r"(\d\d)").explode().values.reshape(-1, 4)
## create new dataframe from arr with given column names
df2 = pd.DataFrame(arr, columns=[f"Div{i+1}" for i in range(arr.shape[1])])
## set "Incomplete" colum to incomplete if any column of row contains "00"
df2["Incomplete"] = np.where(np.any(arr == "00", axis=1), "incomplete", "complete")
pd.concat([df,df2], axis=1)
结果
Name Number Code Div1 Div2 Div3 Div4 Incomplete
0 Tom 78797071 0 00 00 00 00 incomplete
1 Nick 89797071 89 79 70 71 complete
2 Juli 57797074 57 79 70 74 complete
3 June 39797571 0 00 00 00 00 incomplete
4 Junw 23000000 23 00 00 00 incomplete
答案 2 :(得分:1)
您可以使用str.findall("..")
拆分值,然后join
原始df上的列表。使用apply
来获取完整/不完整状态。
import pandas as pd
df = pd.DataFrame({"Name":["Tom","Nick","Juli","June","Junw"],
"Number":[78797071, 0, 0, 39797571, 0],
"Code":[0, 89797071, 57797074, 0, 23000000]})
df = df.join(pd.DataFrame(df["Code"].astype(str).str.findall("..").values.tolist()).add_prefix('DIV')).fillna("00")
df["Incomplete"] = df.iloc[:,3:7].apply(lambda row: "incomplete" if row.str.contains('00').any() else "complete", axis=1)
print (df)
#
Name Number Code DIV0 DIV1 DIV2 DIV3 Incomplete
0 Tom 78797071 0 00 00 00 00 incomplete
1 Nick 0 89797071 89 79 70 71 complete
2 Juli 0 57797074 57 79 70 74 complete
3 June 39797571 0 00 00 00 00 incomplete
4 Junw 0 23000000 23 00 00 00 incomplete