将两个布尔列转换为Pandas中的类ID

时间:2017-03-04 19:16:18

标签: python pandas numpy scikit-learn boolean

我必须布尔列:

df = pd.DataFrame([[True,  True],
                   [True,  False],
                   [False, True],
                   [True,  True],
                   [False, False]],
               columns=['col1', 'col2'])

我需要生成一个新列,用于标识它们属于哪个唯一组合:

result = pd.Series([0, 1, 2, 0, 3])

似乎应该有一个非常简单的方法来做到这一点,但它逃避了我。也许使用sklearn.preprocessing的东西?简单PandasNumpy解决方案同样优先。

编辑:如果解决方案可扩展到超过2列,那将非常好

2 个答案:

答案 0 :(得分:1)

最简单的是使用factorize创建tuples

print (pd.Series(pd.factorize(df.apply(tuple, axis=1))[0]))
0    0
1    1
2    2
3    0
4    3
dtype: int64

另一种投射到stringsum的解决方案:

print (pd.Series(pd.factorize(df.astype(str).sum(axis=1))[0]))
0    0
1    1
2    2
3    0
4    3
dtype: int64

答案 1 :(得分:0)

我之前从未使用过熊猫,但这里有一个普通蟒蛇的解决方案,我肯定不会很难适应大熊猫:

<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Untitled Document</title>
</head>

<body>

<div id="error"> i must display that error here</div>

<form class="s_submit" method="post"> 
<label class="def_lab">File:</label> 
<input class="t_box" type='text' name='filename' placeholder='File Name'> 
<label class="t_lab def_lab">Select Folder:</label> 
<select id="soflow" name="thisfolder"> 
<option selected="selected" value="default">Default</option> 
<option value="../embed/tv/xbox/">Xbox</option> 
<option value="Folder2">Folder2</option> 
<option value="Folder3">Folder3</option>
</select><br><br> 
<label class="def_lab">Text Area 1:</label><br> 
<textarea class="tarea_box" type='text' name='strin'></textarea><br><br> 
<label class="def_lab">Text Area 2:</label><br> 
<textarea class="tarea_box" type='text' name='strin2'></textarea><br> 
<button type="submit" class="btn btn-primary">Submit</button> 
</form>  

<?php
var_dump($_POST);
$fNum = 'File Name is Required';
if ($_SERVER["REQUEST_METHOD"] == "POST") { 
if(empty($_POST['filename'])) {
echo $fNum;
return; 
}
if($_POST['thisfolder'] == 'default') { 
$schanName = 'Please select a Folder';
return; 
}
$filename=$_POST['filename']; 
$words = array("1", "2", "3", "4", "5"); 
$arrlength = count($words); 
$found = false; 

for($x = 0; $x < $arrlength; $x++) { 
if($filename == $words[$x]) 
{ 
$found = true; 
} 
} 

if($found) 
{ 
echo 'Not a valid File Name'; 
return; 
} 
// the name of the file to create 
$filename=$_POST['filename']; 
// the name of the file to be in page created 
$strin=$_POST['strin']; 
// the name of the file to be in page created 
$strin2=$_POST['strin2']; 
// the name of the folder to put $filename in 
$thisFolder = $_POST['thisfolder']; 
// make sure #thisFolder of actually a folder 
if (!is_dir(__DIR__.'/'.$thisFolder)) { 
// if not, we need to make a new folder 
mkdir(__DIR__.'/'.$thisFolder); 
} 
// . . . /[folder name]/page[file name].php 
$myFile = __DIR__.'/'.$thisFolder. "/page" .$filename.".php"; 

// This is another way of writing an if statment 
$div = ($strin !== '') ? '<div id="area_code">'.$strin.'</div>' : '<div id="area_code">'.$strin2.'</div>'; 



$fh = fopen($myFile, 'w'); 
$stringData = ""; 

fwrite($fh, $stringData); 
fclose($fh); 
} 
?>

</body>
</html>

这适用于任意数量的列,只需使用以下结果即可将结果输入:

a = [[True,  True],
       [True,  False],
       [False, True],
       [True,  True],
       [False, False]]

ids, result = [], [] # ids, keeps a list of previously seen items. result, keeps the result

for x in a:
    if x in ids: # x has been seen before
        id = ids.index(x) # find old id
        result.append(id)
    else: # x hasn't been seen before
        id = len(ids) # create new id
        result.append(id)
        ids.append(x)

print(result) # [0, 1, 2, 0, 3]