数据框行中是否存在列标题?

时间:2017-07-25 21:51:08

标签: python pandas

以下是一个示例数据框:

cols = ["report_suite", "ProductID", "Manufacturer", "Brand Manager", "Finish"]
data = [["rs_1", "ProductID", "Manufacturer", "Finish", np.nan], ["rs_2", 
"ProductID", "Manufacturer", "Brand Manager", "Finish"], ["rs_3", 
"Brand Manager", "Finish", np.nan, np.nan]]
df = pd.DataFrame(data, columns = cols)

我想要做的是在每列中都有一个带有布尔值的数据透视表,以确定列标题是否在数据行中(不包括report_suite列)。所以我想要的最终输出是:

cols = ["report_suite", "ProductID", "Manufacturer", "Brand Manager", "Finish"]
data = [["rs_1", 1, 1, 0, 1], ["rs_2", 1, 1, 1, 1], ["rs_3",  0, 0, 1, 1]]
final_df = pd.DataFrame(data, columns = cols)

2 个答案:

答案 0 :(得分:1)

In [185]: df.set_index('report_suite') \
            .apply(lambda x: x.eq(x.name)) \
            .astype(np.int8) \
            .reset_index()
Out[185]:
  report_suite  ProductID  Manufacturer  Brand Manager  Finish
0         rs_1          1             1              0       0
1         rs_2          1             1              1       1
2         rs_3          0             0              0       0

In [191]: df.set_index('report_suite') \
            .fillna('') \
            .apply(lambda x: x.str.contains(x.name)) \
            .astype(np.int8) \
            .reset_index()
Out[191]:
  report_suite  ProductID  Manufacturer  Brand Manager  Finish
0         rs_1          1             1              0       0
1         rs_2          1             1              1       1
2         rs_3          0             0              0       0

答案 1 :(得分:0)

我使用字典方法,如果你能弄清楚如何更改数据框的索引,那么你很高兴

import pandas as pd
import numpy as np

cols = ["report_suite", "ProductID", "Manufacturer", "Brand Manager", "Finish"]
data = [["rs_1", "ProductID", "Manufacturer", "Finish", np.nan], ["rs_2", 
    "ProductID", "Manufacturer", "Brand Manager", "Finish"], ["rs_3", 
     "Brand Manager", "Finish", np.nan, np.nan]]
df = pd.DataFrame(data, columns = cols)


preprocessed_data = []
for item in data:
    item.pop(0)
    preprocessed_data.append(item)

wordSet = set(preprocessed_data[0]).union(set(preprocessed_data[1])).union(set(preprocessed_data[2]))

wordict1 = dict.fromkeys(wordSet,0)
wordict2 = dict.fromkeys(wordSet,0)
wordict3 = dict.fromkeys(wordSet,0)

for word in preprocessed_data[0]:
    wordict1[word] += 1

for word in preprocessed_data[1]:
    wordict2[word] += 1

for word in preprocessed_data[2]:
    wordict3[word] += 1

dframe = pd.DataFrame([wordict1 , wordict2 , wordict3])