我想找出为什么我的代码只返回每一行的第一个字母,而不是最长的匹配字符串? 我要处理包含1列和15,500行的大型数据集
import csv
import pandas as pd
import numpy as np
df = pd.read_csv('newproducts.csv',error_bad_lines=False)df
df['onkey'] = 1
df1 pd.merge(df[['name','onkey']],df[['name','onkey']], on='onkey')
df1['list'] = df1.apply(lambda x:[x.name_x,x.name_y],axis=1)
from os.path import commonprefix
df1['COL1'] = df1['list'].apply(lambda x:commonprefix(x))
df1['COL1_num'] = df1['COL1'].apply(lambda x:len(x))
df1 = df1[(df1['COL1_num']!=0)]
df1 = df1.loc[df1.groupby('name_x')['COL1_num'].idxmin()]
df = df.rename(columns ={'name':'name_x'})
df = pd.merge(df,df1[['name_x','COL1']],on='name_x',how ='left')
df['len'] = df['COL1'].apply(lambda x: len(x))
df['other'] = df.apply(lambda x: x.name_x[x.len:],axis=1)
df['COL1'] = df['COL1'].apply(lambda x: x.strip())
df['COL1'] = df['COL1'].apply(lambda x: x[:-1] if x[-1]=='-' else x)
df['other'] = df['other'].apply(lambda x:x.split('-'))
df = df[['COL1','other']]
输入 因此,这将是您开始的专栏: 我想找到最长的通用字符串,然后将不匹配的部分放在单独的列中
product name
10 funniest Silicone Emperor - Ivory
10 funniest Stud 7 Inches - Hot Pink
10 funny elephant Hummer - Pink
10 funny elephant Hummer - Purple
10 Inch Realistic Dual Density Squirting snake
10 Inch Silicone Comfort Nozzle Attachment
10" comforter snake & comforter Bit Set - Black
10" comforter Jelly & comforter Bit Set - Pink
10" comforter Jelly & comforter Bit Set - Purple
10" Thick ladder W/balls & Suction - Black
100 insect magnets
1000 cloud Games
10-funniest Adonis Conqueror - Black
10-funniest Adonis Explorer - Red
10-funniest Adonis Vibrating Probe - Red
10-funniest Adonis Vibrating Strokers - Red
10-funniest Charisma Bliss - Black
10-funniest Charisma Bliss - Pink
10-funniest Charisma Kiss - Pink
10-funniest Charisma Tryst - Black
10-funniest Risque G-Vibe - Black
10-funniest Risque G-Vibe - Blue
10-funniest Risque G-Vibe - Purple
10-funniest Risque Slim - Black
10-funniest Risque Slim - Blue
10-funniest Risque Slim - Purple
10-funniest Risque Tulip - Black
10-funniest Risque Tulip - Blue
10-funniest Risque Tulip - Purple
输出-输出结果将是在第一列中匹配,而在另一列中不匹配的部分
new product name
10 funniest Silicone Emperor Ivory
10 funniest Stud 7 Inches Hot Pink
10 funny elephant Hummer Pink
10 funny elephant Hummer Purple
10 Inch Realistic Dual Density Squirting snake
10 Inch Silicone Comfort Nozzle Attachment
10" comforter snake & comforter Bit Set Black
10" comforter Jelly & comforter Bit Set Pink
10" comforter Jelly & comforter Bit Set Purple
10" Thick ladder W/balls & Suction Black
100 insect magnets
1000 cloud Games
10-funniest Adonis Conqueror Black
10-funniest Adonis Explorer Red
10-funniest Adonis Vibrating Probe Red
10-funniest Adonis Vibrating Strokers Red
10-funniest Charisma Bliss Black
10-funniest Charisma Bliss Pink
10-funniest Charisma Kiss Pink
10-funniest Charisma Tryst Black
10-funniest Risque G-vibe Black
10-funniest Risque G-vibe Blue
10-funniest Risque G-vibe Purple
10-funniest Risque Slim Black
10-funniest Risque Slim Blue
10-funniest Risque Slim Purple
10-funniest Risque Tulip Black
10-funniest Risque Tulip Blue
10-funniest Risque Tulip Purple