我是Marathon.raceValues = new int[aValue][aValue];
的新手,并且pandas
与以下内容类似
data frame
我想从上面的import pandas as pd
df = pd.DataFrame({'id': ["1", "2", "3","4","5"],
'mill': ["Company A Palm Oil Mill – Special Company A of CC Ltd",
"Company X POM – Company X Ltd","DDDD Mill – Company New and Old Ltd",
"Company Not Special – R Mill","Greatest Company – Great World POM"]})
id mill
0 1 Company A Palm Oil Mill – Special Company A of...
1 2 Company X POM – Company X Ltd
2 3 DDDD Mill – Company New and Old Ltd
3 4 Company Not Special – R Mill
4 5 Greatest Company – Great World POM
获得的内容如下所示:
是否有一种简单的方法可以将这些子字符串提取到同一列中。磨机名称有时可以在' - '之前和之后,但几乎总是以棕榈油厂,POM或磨机结束。
答案 0 :(得分:1)
以前的解决方案:您可以使用.str.split()
并执行此操作:
df.mill = df.mill.str.split(' –').str[0]
。
更新:看到你有一些限制,你可以建立自己的返回函数(下面称为func
)并将你想要的任何逻辑放在那里。这将循环遍历由-
分割的所有字符串,如果Mill在您返回的第一个单词中。
在其他情况下,我推荐温的解决方案。
import pandas as pd
df = pd.DataFrame({'id': ["1", "2", "3","4","5"],
'mill': ["Company A Palm Oil Mill – Special Company A of CC Ltd",
"Company X POM – Company X Ltd","DDDD Mill – Company New and Old Ltd",
"Company Not Special – R Mill","Greatest Company – Great World POM"]})
def func(x):
#Split array
ar = x.split(' – ')
# If length is smaller than 2 return value
if len(ar) < 2:
return x
# Else loop through and apply logic here
for ind, x in enumerate(ar):
if x.lower().endswith(('mill', 'pom')):
return x
# Nothing found, return x
return x
df.mill = df.mill.apply(func)
print(df)
返回:
id mill
0 1 Company A Palm Oil Mill
1 2 Company X POM
2 3 DDDD Mill
3 4 R Mill
4 5 Great World POM
答案 1 :(得分:1)
IIUC,您可以将str.contains
与关键词 Palm Oil Mill,POM,Mill
s = df.mill.str.split(' – ', expand=True)
df['Name']=s[s.apply(lambda x : x.str.contains('Palm Oil Mill|POM|Mill'))].fillna('').sum(1)
df
Out[230]:
id mill \
0 1 Company A Palm Oil Mill – Special Company A of...
1 2 Company X POM – Company X Ltd
2 3 DDDD Mill – Company New and Old Ltd
3 4 Company Not Special – R Mill
4 5 Greatest Company – Great World POM
Name
0 Company A Palm Oil Mill
1 Company X POM
2 DDDD Mill
3 R Mill
4 Great World POM
答案 2 :(得分:1)
你想拆分连字符(如果有的话),并返回以'Mill'或'POM'结尾的子串:
weak