Question

我有以下数据框：

import pandas as pd
 dt = pd.DataFrame({'col': ['A','A_B']})

我希望rows的{{1}}的值col==A进入新列（all），其余的{{1} }操作

我想要的最后一个df：

col2

我尝试过：

str.split

但我收到此错误：dt = pd.DataFrame({'col': ['A', 'A_B'], 'col2': ['all', 'B']})

Answer 1

如果使用pandas文本函数，则它在您的情况下可以正常工作-返回缺失值，因为第二个列表不存在：

print (dt.col.str.split('_').str[1])
0    NaN
1      B
Name: col, dtype: object


dt['col2'] = np.where(dt.col == 'A', 'all', dt.col.str.split('_').str[1])
print (dt)
   col col2
0    A  all
1  A_B    B

或使用[-1]来选择拆分后的最后一个列表：

dt['col2'] = np.where(dt.col == 'A',  'all',
                      dt.col.apply(lambda x: x.split('_')[-1]))

或者可以通过反转掩码来过滤值：

m = dt.col == 'A'
dt['col2'] = np.where(m, 'all',
                      dt.loc[~m, 'col'].apply(lambda x: x.split('_')[1]))

Answer 2

您可以做到

import scrapy
class SkripsiItem(scrapy.Item):
    url = scrapy.Field()
    title = scrapy.Field()
    author = scrapy.Field()
    time = scrapy.Field()
    crawl_time = scrapy.Field()
    image_urls = scrapy.Field()
    images = scrapy.Field()
    content = scrapy.Field()

如何根据python中其他列的条件将字符串列拆分为另一列？

2 个答案: