我有一个相对简单的数据框,如下所示(见下文)。其中一列" Book"是一个字符串列表。
我的目标是为" Book"中的三个不同值中的每一个创建新的数据帧。也就是说,一个数据框,每个产品都出现在国际,每个产品都出现在国内和订阅中。
我不知道如何制作一个新的数据框,该数据框是通过匹配现有数据框中的部分字符串构建的。是否有内置功能,或者我应该构建一个迭代数据帧的循环,然后构建一个新的循环?
DF
Description Book Product ID
0 Products International, Domestic X11
1 Products International X12
2 Products Domestic X13
3 Products Domestic, International X21
4 Services Subscription, Domestic X23
5 Services International, Domestic X23
6 Services Subscription, International, Domestic X25
我尝试过使用Pandas isin功能的不同组合,但这需要您知道您要查找的确切字符串。在我的情况下,Book列可以包含三个值的任何顺序,因此我无法成功使用isin。
我尝试的循环示例是:
f = []
for index,row in df.iterrows():
if "International" in row['Book']:
f.append
然而,这会创建一个空列表,我知道这是对的。我没有那么强大的构建数据帧循环,任何建议都非常感谢。
我的目标输出是数据框,如下所示:
DF
Description Book Product ID
0 Products International X11
1 Products International X12
2 Products International X21
3 Services International X23
4 Services International X25
并且
DF
Description Book Product ID
0 Products Domestic X11
2 Products Domestic X13
3 Products Domestic X21
4 Services Domestic X23
5 Services Domestic X25
同样适用于Subscription。我已经查看了其他多个SO问题,并且无法找到有助于这种情况的问题。
答案 0 :(得分:1)
我不确定您尝试过的代码是否真的有机会工作。您是否尝试过以下方法:
f = []
for index,row in df.iterrows():
if "International" in row['Book']:
f.append(row)
最后请注意f.append(row)
。
这可能不是最佳方式。
我会尝试以下各种类型的内容,它们会为您提供3个更适合分组的列(df.groupby
),它会为您提供每个类别中的产品列表。
df['International'] = df.apply(lambda r: 'International' in r['Book'])
df['Domestic'] = df.apply(lambda r: 'Domestic' in r['Book'])
df['Subscription'] = df.apply(lambda r: 'Subscription' in r['Book'])
答案 1 :(得分:1)
我在评论时使用get_dummies
s=df.Book.str.get_dummies(sep=',')
[df[s[x]==1].assign(Book=x) for x in s.columns]
Out[198]:
[ Description Book ProductID
0 Products Domestic X11
2 Products Domestic X13
3 Products Domestic X21
4 Services Domestic X23
5 Services Domestic X23
6 Services Domestic X25, Description Book ProductID
0 Products International X11
1 Products International X12
3 Products International X21
5 Services International X23
6 Services International X25, Description Book ProductID
4 Services Subscription X23
6 Services Subscription X25]
答案 2 :(得分:1)
另一种方式:
国际:
df_international = df[df['Book'].str.contains('International')].reset_index(drop=True)
df_international.loc[:, 'Book'] = 'International'
print(df_international)
# Description Book Product ID
#0 Products International X11
#1 Products International X12
#2 Products International X21
#3 Services International X23
#4 Services International X25
国内:
df_domestic = df[df['Book'].str.contains('Domestic')].reset_index(drop=True)
df_domestic.loc[:, 'Book'] = 'Domestic'
print(df_domestic)
# Description Book Product ID
#0 Products Domestic X11
#1 Products Domestic X13
#2 Products Domestic X21
#3 Services Domestic X23
#4 Services Domestic X23
#5 Services Domestic X25
订阅:
df_subscription = df[df['Book'].str.contains('Subscription')].reset_index(drop=True)
df_subscription.loc[:, 'Book'] = 'Subscription'
print(df_subscription)
# Description Book Product ID
#0 Services Subscription X23
#1 Services Subscription X25