我有一个发布商列表,如下所示:
+--------------+
| Site Name |
+--------------+
| Radium One |
| Euronews |
| EUROSPORT |
| WIRED |
| RadiumOne |
| Eurosport FR |
| Wired US |
| Eurosport |
| EuroNews |
| Wired |
+--------------+
我想创建以下结果:
+--------------+----------------+
| Site Name | Publisher Name |
+--------------+----------------+
| Radium One | RadiumOne |
| Euronews | Euronews |
| EUROSPORT | Eurosport |
| WIRED | Wired |
| RadiumOne | RadiumOne |
| Eurosport FR | Eurosport |
| Wired US | Wired |
| Eurosport | Eurosport |
| EuroNews | Euronews |
| Wired | Wired |
+--------------+----------------+
我想了解如何复制我在Power Query中使用的代码:
如果Text.Start([Site Name],4)=" WIRE"那么"有线"其他
如果Text.End([Site Name],3)=" One"那么" RadiumOne"其他
如果未找到匹配项,则添加"休息"
它不必区分大小写。
答案 0 :(得分:0)
您可以使用apply
方法和功能,如:
def handle_text(txt):
if txt.lower()[:4] == 'wire':
return 'Wired'
elif txt.lower()[-3:] == 'one':
return 'RadiumOne'
return 'Rest'
df['Publisher Name'] = df['Site Name'].apply(handle_text)
答案 1 :(得分:0)
我认为您可以使用numpy.where
创建条件的双indexing with str:
s = df['Site Name'].str.lower()
df['new'] = np.where(s.str[:4] == 'wire', 'Wired',
np.where(s.str[-3:] == 'one', 'RadiumOne', 'Rest'))
df['new1'] = np.where(s.str[:4] == 'wire', 'Wired',
np.where(s.str[-3:] == 'one', 'RadiumOne', s.str.split().str[0].str.title()))
print (df)
Site Name new new1
0 Radium One RadiumOne RadiumOne
1 Euronews Rest Euronews
2 EUROSPORT Rest Eurosport
3 WIRED Wired Wired
4 RadiumOne RadiumOne RadiumOne
5 Eurosport FR Rest Eurosport
6 Wired US Wired Wired
7 Eurosport Rest Eurosport
8 EuroNews Rest Euronews
9 Wired Wired Wired