我希望在数据框中找到每列的唯一值。 (整个数据帧的唯一值)
Col1 Col2 Col3
1 A A B
2 C A B
3 B B F
Col1将C作为唯一值,Col2没有,Col3具有F.
任何天才的想法?谢谢!
答案 0 :(得分:3)
您可以Series
使用keep=False
,然后stack
- df = df.stack()
.drop_duplicates(keep=False)
.reset_index(level=0, drop=True)
.reindex(index=df.columns)
print (df)
Col1 C
Col2 NaN
Col3 F
dtype: object
删除所有内容,按drop_duplicates
删除第一个级别,然后删除reset_index
:
print (df)
Col1 Col2 Col3
1 A A B
2 C A X
3 B B F
s = df.stack().drop_duplicates(keep=False).reset_index(level=0, drop=True)
print (s)
Col1 C
Col3 X
Col3 F
dtype: object
s = s.groupby(level=0).unique().reindex(index=df.columns)
print (s)
Col1 [C]
Col2 NaN
Col3 [X, F]
dtype: object
如果每列只有一个唯一值,则上面的解决方案很有效。
我尝试创建更通用的解决方案:
<!DOCTYPE html>
<head>
<meta charset="utf-8">
<title>Simple Bar Chart</title>
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<script src="https://d3js.org/d3.v4.min.js"></script> </script>
<style>
.bar {
fill: steelblue;
}
.bar:hover {
fill: brown;
}
</style>
</head>
<body>
<div id="chart" style="width:90%;height:600px;"></div>
<!-- For plotly code to bind to -->
<div id="drop-down"></div>
<script>
// set the dimensions and margins of the graph
var margin = { top: 20, right: 20, bottom: 80, left: 40 };
var width = 960 - margin.left - margin.right;
var height = 500 - margin.top - margin.bottom;
// append the svg object to the body of the page
// append a 'group' element to 'svg'
// moves the 'group' element to the top left margin
var svg = d3.select("body").append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform",
"translate(" + margin.left + "," + margin.top + ")");
// sends asynchronous request to the url
var HttpClient = function() {
this.get = function(aUrl, aCallback) {
var anHttpRequest = new XMLHttpRequest();
anHttpRequest.onreadystatechange = function() {
if (anHttpRequest.readyState == 4 && anHttpRequest.status == 200) {
aCallback(anHttpRequest.responseText);
}
}
anHttpRequest.open("GET", aUrl, true);
anHttpRequest.send(null);
}
};
var client = new HttpClient();
//hard coded URL for now, will accept from UI later
myURL = "https://neel-dot-village-test.appspot.com/_ah/api/searchApi/v1/fetchChartData?chartSpecs=%7B%22axis1%22%3A+%22name%22%2C+%22axis2%22%3A%22cumulativeNumbers.totalBudget%22%7D&topicType=%2Ftype%2Ftata%2Fproject";
client.get(myURL, function(response) {
var jresp = JSON.parse(response); //get response as JS object
plotHist(JSON.parse(jresp.json));
});
var chooseChart = function(data){
buttons: [{
method: plotHist,
args: data,
label: 'Histogram'
}, {
method: plotBar,
args: data,
label: 'Bar Chart'
}]
};
var plotHist = function(data) {
var plotdata = [{
x: data.y.values,
type: 'histogram',
marker: {
//color: 'rgba(100,250,100,0.7)'
},
}];
var layout = {
xaxis: {
title: data.y.label,
rangeslider: {} }, //does not automatically adjust bin sizes though
yaxis: { title: "Count" },
updatemenus: chooseChart(data),
autobinx: true
};
Plotly.newPlot('chart', plotdata, layout);
};
var plotBar = function(data) { //using plotly (built on d3)
var plotdata = [{
x: data.x.values,
y: data.y.values,
type: 'bar'
}];
var layout = {
xaxis: { title: data.x.label },
yaxis: { title: data.y.label },
updatemenus: chooseChart(data)
};
Plotly.newPlot('chart', plotdata, layout);
};
</script>
</body>
答案 1 :(得分:0)
我不相信这正是你想要的,但是作为有用的信息 - 您可以使用numpy的.unique()
找到DataFrame的唯一值,如下所示:
>>> np.unique(df[['Col1', 'Col2', 'Col3']])
['A' 'B' 'C' 'F']
您还可以获取特定列的唯一值,例如Col3
:
>>> df.Col3.unique()
['B' 'F']