我想创建一个新列,其中包含行中所有不同的值。行中的每个值都是一个字符串(不是列表)。
这是数据框的外观:
+-----------------------------+-------------------------+---------------------------------------------+
| first | second | third |
+-----------------------------+-------------------------+---------------------------------------------+
|['able', 'shovel', 'door'] |['shovel raised'] |['shovel raised', 'raised', 'door', 'shovel']|
|['grade control'] |['grade'] |['grade'] |
|['light telling', 'love'] |['would love', 'closed'] |['closed', 'light'] |
+-----------------------------+-------------------------+---------------------------------------------+
这是在创建具有不同值的新列之后数据框的外观。
df = pd.DataFrame({'first': "['able', 'shovel', 'door']" , 'second': "['shovel raised']", 'third': "['shovel raised', 'raised', 'door', 'shovel']", "Distinct_set": "['able', 'shovel', 'door', 'shovel raised', 'raised']" }, index = [0])
我该怎么办?
答案 0 :(得分:1)
尝试一下:
df['new_col'] = df.apply(lambda x: list(set(x['first'] + x['second']+x['third'])), axis =1)
它创建的单个字符集,因为您单元格中的数据是字符串。
“ ['able','shovel','door']”
在下面更正此用法:
df['new_col'] = df.apply(lambda x: list(set(eval(x['first']) + eval(x['second'])+eval(x['third']))), axis =1)
答案 1 :(得分:1)
如何?
import pandas as pd
import numpy as np
df = pd.DataFrame([[['able', 'shovel', 'door'], ['shovel raised'], ['shovel raised', 'raised', 'door', 'shovel']], [['grade control'], ['grade'], ['grade']], [['light telling', 'love'], ['would love', 'closed'], ['closed', 'light']]], columns=['first', 'second', 'third'])
df.apply(lambda row: [np.unique(np.hstack(row))], raw=True, axis=1)
最后一条命令产生:
0 [[able, door, raised, shovel, shovel raised]]
1 [[grade, grade control]]
2 [[closed, light, light telling, love, would lo...
可以保存在数据框的新列中
df['Distinct_set'] = df.apply(lambda row: [np.unique(np.hstack(row))], raw=True, axis=1)
答案 2 :(得分:0)
您可以在下面的代码段中试用
import json
def get_list_from_str(s):
return json.loads(s.replace("'", '"'))
def flatten_list_rows(row):
return (set(
get_list_from_str(row['first']) +
get_list_from_str(row['second']) +
get_list_from_str(row['third'])
))
df['Distinct_set'] = df.apply(flatten_list_rows, axis=1)