以下是一个示例数据框:
cols = ["report_suite", "ProductID", "Manufacturer", "Brand Manager", "Finish"]
data = [["rs_1", "ProductID", "Manufacturer", "Finish", np.nan], ["rs_2",
"ProductID", "Manufacturer", "Brand Manager", "Finish"], ["rs_3",
"Brand Manager", "Finish", np.nan, np.nan]]
df = pd.DataFrame(data, columns = cols)
我想要做的是在每列中都有一个带有布尔值的数据透视表,以确定列标题是否在数据行中(不包括report_suite
列)。所以我想要的最终输出是:
cols = ["report_suite", "ProductID", "Manufacturer", "Brand Manager", "Finish"]
data = [["rs_1", 1, 1, 0, 1], ["rs_2", 1, 1, 1, 1], ["rs_3", 0, 0, 1, 1]]
final_df = pd.DataFrame(data, columns = cols)
答案 0 :(得分:1)
In [185]: df.set_index('report_suite') \
.apply(lambda x: x.eq(x.name)) \
.astype(np.int8) \
.reset_index()
Out[185]:
report_suite ProductID Manufacturer Brand Manager Finish
0 rs_1 1 1 0 0
1 rs_2 1 1 1 1
2 rs_3 0 0 0 0
或
In [191]: df.set_index('report_suite') \
.fillna('') \
.apply(lambda x: x.str.contains(x.name)) \
.astype(np.int8) \
.reset_index()
Out[191]:
report_suite ProductID Manufacturer Brand Manager Finish
0 rs_1 1 1 0 0
1 rs_2 1 1 1 1
2 rs_3 0 0 0 0
答案 1 :(得分:0)
我使用字典方法,如果你能弄清楚如何更改数据框的索引,那么你很高兴
import pandas as pd
import numpy as np
cols = ["report_suite", "ProductID", "Manufacturer", "Brand Manager", "Finish"]
data = [["rs_1", "ProductID", "Manufacturer", "Finish", np.nan], ["rs_2",
"ProductID", "Manufacturer", "Brand Manager", "Finish"], ["rs_3",
"Brand Manager", "Finish", np.nan, np.nan]]
df = pd.DataFrame(data, columns = cols)
preprocessed_data = []
for item in data:
item.pop(0)
preprocessed_data.append(item)
wordSet = set(preprocessed_data[0]).union(set(preprocessed_data[1])).union(set(preprocessed_data[2]))
wordict1 = dict.fromkeys(wordSet,0)
wordict2 = dict.fromkeys(wordSet,0)
wordict3 = dict.fromkeys(wordSet,0)
for word in preprocessed_data[0]:
wordict1[word] += 1
for word in preprocessed_data[1]:
wordict2[word] += 1
for word in preprocessed_data[2]:
wordict3[word] += 1
dframe = pd.DataFrame([wordict1 , wordict2 , wordict3])