我的硕士论文是通过分析元数据和存储的数据来发现糟糕的数据库设计。我们通过从给定的DBMS中提取元数据模型,然后对此元数据运行一组规则来实现此目的。
要通过数据分析扩展此过程,我们需要允许规则直接查询数据库,但我们必须保持DBMS独立性,以便查询可以应用于PostgreSQL,MSSQL和MySQL。
我们已经讨论了一种查询的功能构造,例如:
new Query(new Select(columnID), new From(tableID), new Where(new Equality(columnID1, columnID2)))
然后使用DBMS特定的序列化程序。
另一种方法是让规则自己处理:
public Query QueryDatabase(DBMS dbms)
{
if (dbms == PostgreSQL) { return "select count(1) from Users"}
if (dbms == MSSQL) {return ....}
}
我们错过了什么吗?事实上所有这些都存在于一个漂亮的图书馆吗?是的,我们已经查看了实体框架,但它们似乎依赖于数据库的静态类型模型,这显然无法创建。
我应该提到我们维护一个可扩展的规则架构,允许最终用户实现自己的规则。
为了阐明我们想要实现的目标,请查看以下查询(mssql),它需要两个参数,即表的名称(@table)和列的名称(@column):
DECLARE @TotalCount FLOAT;
SELECT @TotalCount = COUNT(1) FROM [@table];
SELECT SUM(pcount * LOG10(@TotalCount / pcount)) / (LOG10(2) * @TotalCount)
FROM (SELECT (Count([@column])) as pcount
FROM [@table]
GROUP BY [@column]) as exp1
查询通过估计熵来测量存储在给定属性中的信息量。它需要访问表中的所有行。为了避免从数据库中提取所有行并通过慢速网络连接传输它们,最好在SQL中表达它们,只传输一个数字。
注意:我们 DO 拥有我们需要的所有元数据。这个问题仅用于访问数据!
我不太确定是否要将此添加到我已经很久的问题,编辑现有答案或者是什么待办事项。请随时提出建议。 ;)
建立在mrnye回答:
new Query()
.Variable(varname => FLOAT)
.Set(varname => new Query().Count(1).From(table) )
.Select(new Aggregate().Sum(varname => "pcount * LOG10(varname / pcount)"))
.From(
new Query()
.Select(pcount => new Aggregate().Count(column)
.From(table)
.GroupBy(column)
)
除了语法错误和lambda语句的误用之外,我还想到了使用一些扩展方法来构建查询。它似乎是一种相当复杂的方法。您如何看待这种方法?
以LINQ答案为基础:
let totalCount = Table.Count
from uv un from r in Table
group r by r["attr"]
select r.Count
select r.Count * Log2((totalCount / r.Count))
看起来相当不错,但实施起来很不错......
答案 0 :(得分:2)
您可以通过实施自定义LINQ provider基础架构来实现相同目标。查询是通用的,但生成SQL查询的AST tree visitors可以是可插入的。您甚至可以使用内存数据存储模拟数据库并将自定义LINQ查询转换为LINQ to对象查询!
您需要创建一个知道如何从对象的索引器中提取列名的提供程序。这是一个可以扩展的基本框架:
// Runs in LinqPad!
public class TableQueryObject
{
private readonly Dictionary<string, object> _data = new Dictionary<string, object>();
public string TableName { get; set; }
public object this[string column]
{
get { return _data.ContainsKey(column) ? _data[column] : null; }
set { if (_data.ContainsKey(column)) _data[column] = value; else _data.Add(column, value); }
}
}
public interface ITableQuery : IEnumerable<TableQueryObject>
{
string TableName { get; }
string ConnectionString { get; }
Expression Expression { get; }
ITableQueryProvider Provider { get; }
}
public interface ITableQueryProvider
{
ITableQuery Query { get; }
IEnumerable<TableQueryObject> Execute(Expression expression);
}
public interface ITableQueryFactory
{
ITableQuery Query(string tableName);
}
public static class ExtensionMethods
{
class TableQueryContext : ITableQuery
{
private readonly ITableQueryProvider _queryProvider;
private readonly Expression _expression;
public TableQueryContext(ITableQueryProvider queryProvider, Expression expression)
{
_queryProvider = queryProvider;
_expression = expression;
}
public string TableName { get { return _queryProvider.Query.TableName; } }
public string ConnectionString { get { return _queryProvider.Query.ConnectionString; } }
public Expression Expression { get { return _expression; } }
public ITableQueryProvider Provider { get { return _queryProvider; } }
public IEnumerator<TableQueryObject> GetEnumerator() { return Provider.Execute(Expression).GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}
public static MethodInfo MakeGeneric(MethodBase method, params Type[] parameters)
{
return ((MethodInfo)method).MakeGenericMethod(parameters);
}
public static Expression StaticCall(MethodInfo method, params Expression[] expressions)
{
return Expression.Call(null, method, expressions);
}
public static ITableQuery CreateQuery(this ITableQueryProvider source, Expression expression)
{
return new TableQueryContext(source, expression);
}
public static IEnumerable<TableQueryObject> Select<TSource>(this ITableQuery source, Expression<Func<TSource, TableQueryObject>> selector)
{
return source.Provider.CreateQuery(StaticCall(MakeGeneric(MethodBase.GetCurrentMethod(), typeof(TSource)), source.Expression, Expression.Quote(selector)));
}
public static ITableQuery Where(this ITableQuery source, Expression<Func<TableQueryObject, bool>> predicate)
{
return source.Provider.CreateQuery(StaticCall((MethodInfo)MethodBase.GetCurrentMethod(), source.Expression, Expression.Quote(predicate)));
}
}
class SqlTableQueryFactory : ITableQueryFactory
{
class SqlTableQuery : ITableQuery
{
private readonly string _tableName;
private readonly string _connectionString;
private readonly ITableQueryProvider _provider;
private readonly Expression _expression;
public SqlTableQuery(string tableName, string connectionString)
{
_connectionString = connectionString;
_tableName = tableName;
_provider = new SqlTableQueryProvider(this);
_expression = Expression.Constant(this);
}
public IEnumerator<TableQueryObject> GetEnumerator() { return Provider.Execute(Expression).GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public string TableName { get { return _tableName; } }
public string ConnectionString { get { return _connectionString; } }
public Expression Expression { get { return _expression; } }
public ITableQueryProvider Provider { get { return _provider; } }
}
class SqlTableQueryProvider : ITableQueryProvider
{
private readonly ITableQuery _query;
public ITableQuery Query { get { return _query; } }
public SqlTableQueryProvider(ITableQuery query) { _query = query; }
public IEnumerable<TableQueryObject> Execute(Expression expression)
{
//var connecitonString = _query.ConnectionString;
//var tableName = _query.TableName;
// TODO visit expression AST (generate any sql dialect you want) and execute resulting sql
// NOTE of course the query can be easily parameterized!
// NOTE here the fun begins, just return some dummy data for now :)
for (int i = 0; i < 100; i++)
{
var obj = new TableQueryObject();
obj["a"] = i;
obj["b"] = "blah " + i;
yield return obj;
}
}
}
private readonly string _connectionString;
public SqlTableQueryFactory(string connectionString) { _connectionString = connectionString; }
public ITableQuery Query(string tableName)
{
return new SqlTableQuery(tableName, _connectionString);
}
}
static void Main()
{
ITableQueryFactory database = new SqlTableQueryFactory("SomeConnectionString");
var result = from row in database.Query("myTbl")
where row["someColumn"] == "1" && row["otherColumn"] == "2"
where row["thirdColumn"] == "2" && row["otherColumn"] == "4"
select row["a"]; // NOTE select executes as linq to objects! FTW
foreach(var a in result)
{
Console.WriteLine(a);
}
}
答案 1 :(得分:1)
我认为LINQ路线是要走的路,但为了好玩,我试着想出一个解决方案。它需要一些工作,但一般的想法是让查询界面流畅并隐藏接口背后的实现逻辑。把它扔出去作为思考的食物......
public interface IDBImplementation
{
public void ProcessQuery(Select query);
}
public class SqlServerImplementation : IDBImplementation
{
public void ProcessQuery(Select query)
{
string sqlQuery = "SELECT " + String.Join(", ", query.Columns)
+ " FROM " + query.TableName + " WHERE " + String.Join(" AND ", query.Conditions);
// execute query...
}
}
public class Select
{
public Select(params string[] columns)
{
Columns = columns;
}
public string[] Columns { get; set; }
public string TableName { get; set; }
public string[] Conditions { get; set; }
}
public static class Extensions
{
public static Select From(this Select select, string tableName)
{
select.TableName = tableName;
return select;
}
public static Select Where(this Select select, params string[] conditions)
{
select.Conditions = conditions;
return select;
}
}
public static class Main
{
public static void Example()
{
IDBImplementation database = new SqlServerImplementation();
var query = new Select("a", "b", "c").From("test").Where("c>5", "b<10");
database.ProcessQuery(query);
}
}
答案 2 :(得分:0)
检索有关数据库信息的最与DBMS无关的方法是iNFORMATION_SCHEMA。请参阅MySQL,SQL Server,PostgreSQL。
我有点好奇你在想什么类型的规则。 :)
答案 3 :(得分:0)
NHibernate怎么样? http://community.jboss.org/wiki/NHibernateforNET