我想将第2列中的缺失值填充到相应的col1中。
import pandas as pd
data={"col1":["A","B","C","A","B","C","A","B","A"], "col2":["{hey1}"," ","{hello2}","{hey2}","{he1}","{hello3}","set()","set()","{hey1}"]}
df=pd.DataFrame(data=data)
它应该用一些规则填充,如下所示: 例如,如果A出现四次且在4中出现,则它具有对应的col2值三次,而第四个缺失。 因此缺失值应该是这三者的结合。像在这种情况下一样,3个值分别为hey1,hey2,hey1。第四失踪 应该包含hey2,hey1。 Set()是垃圾值,我不需要该值。因此,在处理列比较之前,我想将其删除。 所需的输出:
col1 col2
A hey1
B he1
C hello2
A hey2
B he1
C hello3
A hey1,hey2
B he1
A hey1
答案 0 :(得分:1)
data = {"col1": ["A", "B", "C", "A", "B", "C", "A", "B", "A"],
"col2": ["", " ", "hello2", "hey2", "he1", "hello3", " ", "", ""]}
col1 = data["col1"]
col2 = data["col2"]
d = collections.defaultdict(list)
new_col2 = []
for i, tup in enumerate(list(zip(col1, col2))):
key, value = tup
if not value.strip():
new_val = ", ".join(d[key])
if not new_val:
if len(new_col2) >= 1:
new_val = new_col2[i - 1]
else:
new_val = ""
new_col2.append(new_val)
else:
d[key].append(value)
new_col2.append(value)