嗨,我正在尝试从熊猫数据框中提取尺寸并将其追加到列表中。
Variations
Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xXlarge;
Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xlarge;
Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge;
Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge;
SIZE - COLOR| L/XL - Lime; 2XL/3XL - Lime;
这是我到目前为止尝试过的。
def size_extractor(data):
size_list = []
for char in data:
if char == "|":
if char == " ":
continue
size_list.append(char)
elif char == ";":
continue
print(size_list)
df['Variations'].apply(size_extractor)
我正在尝试使用"|"
启动提取操作,并抓取" "
和";"
之间的字符。
最后得到一个像这样的列表[Medium, Large, Xlarge, 2Xlarge, 3Xlarge, 4Xlarge, 5xXlarge]
我应该在while循环中重做吗?
答案 0 :(得分:0)
import pandas as pd
d = {'Variations': ['Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xXlarge; ',
'Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xlarge; ',
'Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge;',
'Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge; ',
'SIZE - COLOR| L/XL - Lime; 2XL/3XL - Lime;']}
df = pd.DataFrame(data=d)
def size_extractor(data):
size_list = list(map(lambda x: x.strip(), data.split('|')[1].split(';')))
print(size_list)
df['Variations'].apply(size_extractor)
代码说明
data.split('|')[1]
:在“ |”处分割数据我们将在后面的部分中使用
split(';')
:将数据拆分为“;”
lambda x: x.strip()
和map()
:删除字符串前后的空格
list()
:用于访问map()
答案 1 :(得分:0)
def size_extracter(data):
print(data)
size_list = []
size = ""
for char in data:
if char == "|":
size_list.append(size)
continue
elif char == " ":
size = ""
continue
else:
size = size + char
print(size_list)
df['Variations'] = df['Variations'].str.replace(r'^[^|]*\|\s*', '').str.replace(';', '|', regex=False)
df['Variations'].apply(size_extracter)