我一直试图弄清楚这一点。我是Python的新手。
我有一张约有50,000条记录的表格。但是下表将解释我想要做的事情。
我想添加名为Category的第三列。此列将包含基于“电影”列中设置的条件的基于值的结果。
-----------------------------------------
N | Movies
-----------------------------------------
1 | Save the Last Dance
-----------------------------------------
2 | Love and Other Drugs
---------------------------------------
3 | Dance with Me
---------------------------------------
4 | Love Actually
---------------------------------------
5 | High School Musical
----------------------------------------
条件是这样的;在“电影”栏中搜索这些单词{Dance,Love,and Musical)。如果在字符串中找到该单词,则返回“类别”列中的单词。
这将在最后产生这样的新数据帧;
-----------------------------------------
N | Movies | Category
-----------------------------------------
1 | Save the Last Dance | Dance
-----------------------------------------
2 | Love and Other Drugs | Love
---------------------------------------
3 | Dance with Me | Dance
---------------------------------------
4 | Love Actually | Love
---------------------------------------
5 | High School Musical | Musical
----------------------------------------
提前致谢!!
答案 0 :(得分:0)
如果你有一个2D列表,那么就这样做:
def add_category(record):
movie = record[1]
categories = []
for category in ['Dance', 'Love', 'Musical']:
if category in movie:
categories.append(category)
return record.append(', '.join(categories))
database = [add_category(record) for record in database]
您可以通过更改add_category()
功能来更改类别列的值的方式。
答案 1 :(得分:0)
更快的方法是为所有类别创建一个掩码,假设您的数字很小:
In [22]:
dance_mask = df['Movies'].str.contains('Dance')
love_mask = df['Movies'].str.contains('Love')
musical_mask = df['Movies'].str.contains('Musical')
df[dance_mask]
Out[22]:
N Movies
0 1 Save the Last Dance
2 3 Dance with Me
[2 rows x 2 columns]
In [26]:
# now set category
df.ix[dance_mask,'Category'] = 'Dance'
df
Out[26]:
N Movies Category
0 1 Save the Last Dance Dance
1 2 Love and Other Drugs NaN
2 3 Dance with Me Dance
3 4 Love Actually NaN
4 5 High School Musical NaN
[5 rows x 3 columns]
In [28]:
# repeat for remaining masks
df.ix[love_mask,'Category'] = 'Love'
df.ix[musical_mask,'Category'] = 'Musical'
df
Out[28]:
N Movies Category
0 1 Save the Last Dance Dance
1 2 Love and Other Drugs Love
2 3 Dance with Me Dance
3 4 Love Actually Love
4 5 High School Musical Musical
[5 rows x 3 columns]