我必须布尔列:
df = pd.DataFrame([[True, True],
[True, False],
[False, True],
[True, True],
[False, False]],
columns=['col1', 'col2'])
我需要生成一个新列,用于标识它们属于哪个唯一组合:
result = pd.Series([0, 1, 2, 0, 3])
似乎应该有一个非常简单的方法来做到这一点,但它逃避了我。也许使用sklearn.preprocessing
的东西?简单Pandas
或Numpy
解决方案同样优先。
编辑:如果解决方案可扩展到超过2列,那将非常好
答案 0 :(得分:1)
最简单的是使用factorize
创建tuples
:
print (pd.Series(pd.factorize(df.apply(tuple, axis=1))[0]))
0 0
1 1
2 2
3 0
4 3
dtype: int64
另一种投射到string
和sum
的解决方案:
print (pd.Series(pd.factorize(df.astype(str).sum(axis=1))[0]))
0 0
1 1
2 2
3 0
4 3
dtype: int64
答案 1 :(得分:0)
我之前从未使用过熊猫,但这里有一个普通蟒蛇的解决方案,我肯定不会很难适应大熊猫:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Untitled Document</title>
</head>
<body>
<div id="error"> i must display that error here</div>
<form class="s_submit" method="post">
<label class="def_lab">File:</label>
<input class="t_box" type='text' name='filename' placeholder='File Name'>
<label class="t_lab def_lab">Select Folder:</label>
<select id="soflow" name="thisfolder">
<option selected="selected" value="default">Default</option>
<option value="../embed/tv/xbox/">Xbox</option>
<option value="Folder2">Folder2</option>
<option value="Folder3">Folder3</option>
</select><br><br>
<label class="def_lab">Text Area 1:</label><br>
<textarea class="tarea_box" type='text' name='strin'></textarea><br><br>
<label class="def_lab">Text Area 2:</label><br>
<textarea class="tarea_box" type='text' name='strin2'></textarea><br>
<button type="submit" class="btn btn-primary">Submit</button>
</form>
<?php
var_dump($_POST);
$fNum = 'File Name is Required';
if ($_SERVER["REQUEST_METHOD"] == "POST") {
if(empty($_POST['filename'])) {
echo $fNum;
return;
}
if($_POST['thisfolder'] == 'default') {
$schanName = 'Please select a Folder';
return;
}
$filename=$_POST['filename'];
$words = array("1", "2", "3", "4", "5");
$arrlength = count($words);
$found = false;
for($x = 0; $x < $arrlength; $x++) {
if($filename == $words[$x])
{
$found = true;
}
}
if($found)
{
echo 'Not a valid File Name';
return;
}
// the name of the file to create
$filename=$_POST['filename'];
// the name of the file to be in page created
$strin=$_POST['strin'];
// the name of the file to be in page created
$strin2=$_POST['strin2'];
// the name of the folder to put $filename in
$thisFolder = $_POST['thisfolder'];
// make sure #thisFolder of actually a folder
if (!is_dir(__DIR__.'/'.$thisFolder)) {
// if not, we need to make a new folder
mkdir(__DIR__.'/'.$thisFolder);
}
// . . . /[folder name]/page[file name].php
$myFile = __DIR__.'/'.$thisFolder. "/page" .$filename.".php";
// This is another way of writing an if statment
$div = ($strin !== '') ? '<div id="area_code">'.$strin.'</div>' : '<div id="area_code">'.$strin2.'</div>';
$fh = fopen($myFile, 'w');
$stringData = "";
fwrite($fh, $stringData);
fclose($fh);
}
?>
</body>
</html>
这适用于任意数量的列,只需使用以下结果即可将结果输入:
a = [[True, True],
[True, False],
[False, True],
[True, True],
[False, False]]
ids, result = [], [] # ids, keeps a list of previously seen items. result, keeps the result
for x in a:
if x in ids: # x has been seen before
id = ids.index(x) # find old id
result.append(id)
else: # x hasn't been seen before
id = len(ids) # create new id
result.append(id)
ids.append(x)
print(result) # [0, 1, 2, 0, 3]