Question

我有一个相对简单的数据框，如下所示（见下文）。其中一列＆＃34; Book＆＃34;是一个字符串列表。

我的目标是为＆＃34; Book＆＃34;中的三个不同值中的每一个创建新的数据帧。也就是说，一个数据框，每个产品都出现在国际，每个产品都出现在国内和订阅中。

我不知道如何制作一个新的数据框，该数据框是通过匹配现有数据框中的部分字符串构建的。是否有内置功能，或者我应该构建一个迭代数据帧的循环，然后构建一个新的循环？

DF

    Description      Book                               Product ID
0   Products      International, Domestic                 X11
1   Products      International                           X12
2   Products      Domestic                                X13
3   Products      Domestic, International                 X21
4   Services      Subscription, Domestic                  X23
5   Services      International, Domestic                 X23
6   Services      Subscription, International, Domestic   X25

我尝试过使用Pandas isin功能的不同组合，但这需要您知道您要查找的确切字符串。在我的情况下，Book列可以包含三个值的任何顺序，因此我无法成功使用isin。

我尝试的循环示例是：

f = []
for index,row in df.iterrows():
    if "International" in row['Book']:
        f.append

然而，这会创建一个空列表，我知道这是对的。我没有那么强大的构建数据帧循环，任何建议都非常感谢。

我的目标输出是数据框，如下所示：

DF

    Description      Book                               Product ID
0   Products      International                           X11
1   Products      International                           X12
2   Products      International                           X21
3   Services      International                           X23
4   Services      International                           X25

并且

DF

    Description   Book                               Product ID
0   Products      Domestic                                X11
2   Products      Domestic                                X13
3   Products      Domestic                                X21
4   Services      Domestic                                X23
5   Services      Domestic                                X25

同样适用于Subscription。我已经查看了其他多个SO问题，并且无法找到有助于这种情况的问题。

Answer 1

我不确定您尝试过的代码是否真的有机会工作。您是否尝试过以下方法：

f = []
for index,row in df.iterrows():
    if "International" in row['Book']:
        f.append(row)

最后请注意f.append(row)。

这可能不是最佳方式。

我会尝试以下各种类型的内容，它们会为您提供3个更适合分组的列（df.groupby），它会为您提供每个类别中的产品列表。

df['International'] = df.apply(lambda r: 'International' in r['Book'])
df['Domestic'] = df.apply(lambda r: 'Domestic' in r['Book'])
df['Subscription'] = df.apply(lambda r: 'Subscription' in r['Book'])

Answer 2

我在评论时使用get_dummies

说

s=df.Book.str.get_dummies(sep=',')
[df[s[x]==1].assign(Book=x) for x in s.columns]
Out[198]: 
[  Description      Book ProductID
 0    Products  Domestic       X11
 2    Products  Domestic       X13
 3    Products  Domestic       X21
 4    Services  Domestic       X23
 5    Services  Domestic       X23
 6    Services  Domestic       X25,   Description           Book ProductID
 0    Products  International       X11
 1    Products  International       X12
 3    Products  International       X21
 5    Services  International       X23
 6    Services  International       X25,   Description          Book ProductID
 4    Services  Subscription       X23
 6    Services  Subscription       X25]

Answer 3

另一种方式：

国际：

df_international = df[df['Book'].str.contains('International')].reset_index(drop=True)
df_international.loc[:, 'Book'] = 'International'
print(df_international)
#      Description           Book Product ID
#0        Products  International        X11
#1        Products  International        X12
#2        Products  International        X21
#3        Services  International        X23
#4        Services  International        X25

国内：

df_domestic = df[df['Book'].str.contains('Domestic')].reset_index(drop=True)
df_domestic.loc[:, 'Book'] = 'Domestic'
print(df_domestic)
#      Description      Book Product ID
#0        Products  Domestic        X11
#1        Products  Domestic        X13
#2        Products  Domestic        X21
#3        Services  Domestic        X23
#4        Services  Domestic        X23
#5        Services  Domestic        X25

df_subscription = df[df['Book'].str.contains('Subscription')].reset_index(drop=True)
df_subscription.loc[:, 'Book'] = 'Subscription'
print(df_subscription)
#      Description          Book Product ID
#0        Services  Subscription        X23
#1        Services  Subscription        X25

从部分字符串匹配

3 个答案: