我正在尝试拆分一列,但我注意到拆分会更改其他值。例如,第10行的某些值与第8行交换。为什么?
ID为10的实际数据
| vat_number | email | foi_mail | website
| 10 | abc@test.com;example@test.com;example@test.com | xyz@test.com | example.com
执行以下代码行:
base_data[['email','email_1','email_2']] = pd.DataFrame(base_data.email.str.split(';').tolist(),
columns = ['email','email_1','email_2'])
base_data变为:
| vat_number | email | foi_mail | website | email_1 | email_2
| 10 | some other row value | some other row value | example.com | ------ | -----
之前:
之后:
数据包含数千行,但我只显示了一行。
答案 0 :(得分:0)
尝试在表格中做表格:
def test():
base_data = []
base_data.append(['12','32'])
base_data.append(['352','335'])
base_data.append(['232','32'])
print(base_data)
a = base_data[0]
print(a)
print(a[0])
print(a[1])
input("Enter to contuniue. . .")
并使用循环添加
答案 1 :(得分:0)
如果我理解这种情况。我相信您需要这样的东西:
base_data = base_data.merge(base_data['email'].str.split(';', expand = True).rename(columns = {0:'email',1:'email_1',2:'email_2']}), left_index = True, right_index = True)
这是逻辑解释:
a1 = list('abcdef')
b1 = list('fedcba')
c1 = [f'{x[0]};{x[1]}' for x in zip(a1, b1)]
df1 = pd.DataFrame({'c1':c1})
df1
Out[1]:
c1
0 a;f
1 b;e
2 c;d
3 d;c
4 e;b
5 f;a
df1 = df1.merge(df1['c1'].str.split(';', expand = True).rename(columns = {0:'c2',1:'c3'}), left_index = True, right_index = True)
df1
Out[2]:
c1 c2 c3
0 a;f a f
1 b;e b e
2 c;d c d
3 d;c d c
4 e;b e b
5 f;a f a
答案 2 :(得分:0)
.str.split
的expand
参数:import pandas as pd
# your dataframe
vat_number email foi_mail website
NaN abc@test.com;example@test.com;example@test.com xyz@test.com example.com
# split and expand
df[['email_1', 'email_2', 'email_3']] = df['email'].str.split(';', expand=True)
# drop `email` col
df.drop(columns='email', inplace=True)
# result
vat_number foi_mail website email_1 email_2 email_3
NaN xyz@test.com example.com abc@test.com example@test.com example@test.com