Question

我正在用一个图书发行商的例子来处理熊猫中的数据框。

仓库将生成.csv文件，这些文件将具有相同标题的书籍的签名和未签名（作者）副本视为不同的行，例如：

TITLE      //                      STOCK

A song of ice and fire     //       5

A song of ice and fire (signed)  //  1

但是，我希望每个标题都为一行，但要为已签名的股票增加一列，例如：

TITLE            //                STOCK  //   SIGNED STOCK

A song of ice and fire      //       5       //     1

我已经成功地将CSV读入了熊猫数据框，并添加了一个空白列SIGNED STOCK，并用零填充。我还清理了代码，摆脱了空白和NaN 但是，我不知道如何在行中搜索带有子字符串(signed)的标题，然后将那里的库存添加到相关标题的相关SIGNED STOCK列中。任何帮助，不胜感激！：）

IBS_combined = pd.read_csv("IBS_21_05_19.csv",usecols=[3,12,21],encoding='latin-1')

IBS_combined.columns= ['Product', 'ISBN','Stock']

IBS_combined['Signed Stock']='0'

IBS_combined.replace(['Product'], np.nan, inplace=True)

IBS_combined.dropna(subset=['Product'], inplace=True)

Answer 1

您可以执行以下操作：

signed = []
for row in IBS_combined.iterrows():
    if row['TITLE'].find(your_string) != -1:
        signed.append(row['TITLE'].replace(your_string,''))

然后您可以遍历已签名并添加金额

for item in signed:
    IBS_combined[IBS_combined['TITLE']==item]['SIGNED'] = IBS_combined[IBS_combined['TITLE']==item]['SIGNED'] +1

Answer 2

您可以将数据帧分为两个df，行分别具有带符号和无符号，然后合并结果。下面是一个示例（假定 ISBN 是识别一本书的唯一密钥，并且同一本书中有签名或未签名的股票的条目不得超过1个）：

使用以下代码设置包含ISBN的示例数据：

1个签名和1个未签名的条目
仅1个签名的库存条目

仅1个未签名的库存条目

str="""ISBN // TITLE // STOCK
1 // A song of ice and fire // 5
1 // A song of ice and fire (signed) // 1
2 // another book // 10
2 // another book (signed) // 2
3 // 2nd book // 3
4 // 3rd book (signed) // 1"""

df = pd.read_csv(pd.io.common.StringIO(str), sep=' // ', engine='python')

根据下面的掩码m将数据帧分为两个数据帧：
- df_signed：df[m]
- df_unsigned：df[~m]
```
m = df.TITLE.str.contains('\(signed\)')
```

设置df_signed格式（将ISBN设置为索引，重命名列并从TITLE列中删除子字符串'（signed）'）

df_signed = df[m].set_index('ISBN')\
                 .rename(columns={'STOCK':'SIGNED_STOCK'}) \
                 .replace('\s*\(signed\)', '', regex=True)
print(df_signed)
#                       TITLE  SIGNED_STOCK
#ISBN
#1     A song of ice and fire             1
#2               another book             2
#4                   3rd book             1

设置df_unsigned并使用DataFrame.combine_first()与df_signed联接

df_new = df[~m].set_index('ISBN') \
               .combine_first(df_signed) \
               .fillna(0, downcast='infer') \
               .reset_index() 
print(df_new)
#   ISBN  SIGNED_STOCK  STOCK                   TITLE
#0     1             1      5  A song of ice and fire
#1     2             2     10            another book
#2     3             0      3                2nd book
#3     4             1      0                3rd book

重新排列列的顺序：

cols = ['TITLE', 'ISBN', 'STOCK', 'SIGNED_STOCK']
df_new = df_new[cols]

如果各行在Pandas数据框中具有匹配的子字符串，则将值从一行添加到另一行

2 个答案: