大家好我有一个数据框,其内容就像
name,mv_str
abc,Exorsist part1
abc,doc str 2D
abc,doc str 3D
abc,doc str QA
abc,doc flash
def,plastic
def,plastic income
def,doc str 2D ###i added this row for better clarity
我的预期o / p应该......每组都有唯一的记录行 - 对于每个mailid mv_str不应该是类似的类型i:e。来自一个'mv_str'的第一个2个单词不应该在那里在该特定用户名的第2行/任何行中。
注意:每个用户名都应该进行比较。
name,mv_str
abc,Exorist part1
abc,doc str 2D ###3D and QA removes as 1st 2 words "doc str" matched
abc, doc flash ###only 1st word is matching, 2nd word does not
def,plastic
def,plastic income #It should be present as only one word is matching
def,doc str 2D ###this row should be there as this is for another User
请任何人帮我组建逻辑,否则代码示例会很棒帮助。感谢。
答案 0 :(得分:2)
我认为您需要findViewById()
列mv_str
中的第一个split
字符串并创建新的whitespace
DataFrame
:
df1
通过concat
添加原始df1 = df.mv_str.str.split(expand=True)
print (df1)
0 1 2
0 Exorsist part1 None
1 doc str 2D
2 doc str 3D
3 doc str QA
4 doc flash None
5 plastic None None
6 plastic income None
7 doc str 2D
DataFrame
:
df
然后按df = pd.concat([df, df1], axis=1)
print (df)
name mv_str 0 1 2
0 abc Exorsist part1 Exorsist part1 None
1 abc doc str 2D doc str 2D
2 abc doc str 3D doc str 3D
3 abc doc str QA doc str QA
4 abc doc flash doc flash None
5 def plastic plastic None None
6 def plastic income plastic income None
7 def doc str 2D doc str 2D
,name
和0
列drop_duplicates
,第一个值仍为:
1
按drop
删除列print (df.drop_duplicates(['name',0,1]))
name mv_str 0 1 2
0 abc Exorsist part1 Exorsist part1 None
1 abc doc str 2D doc str 2D
4 abc doc flash doc flash None
5 def plastic plastic None None
6 def plastic income plastic income None
7 def doc str 2D doc str 2D
,0
,1
:
2
或者通过仅选择print (df.drop_duplicates(['name',0,1]).drop([0,1,2], axis=1))
name mv_str
0 abc Exorsist part1
1 abc doc str 2D
4 abc doc flash
5 def plastic
6 def plastic income
7 def doc str 2D
和name
列来更好地删除列:
mv_str