我正在编写一个框架,它将连接到许多不同的数据源类型并从这些源返回值。简单的是SQL,Access和Oracle。更难的是Sharepoint,CSV。
如果我从基于文本的源返回值,我想确定数据的数据类型。
由于CSV是全文,因此没有要查询的元数据,我需要以某种方式解析数据以确定数据类型。
示例:
“true”,“true”,“false”,“false”的列表将是布尔值 “1”,“0”,“1”,“0”的列表将是布尔值 “1”,“4”,“ - 10”,“500”的列表将是整数
“15.2”,“2015.5896”,“1.0245”,“500”的列表为双 “2001/01/01”,“2010/05/29 12:00”,“1989/12/25 10:34:21”的清单将是日期时间
它基于https://stackoverflow.com/questions/606365/c-doubt-finding-the-datatype/606381#606381
object ParseString(string str)
{
Int32 intValue;
Int64 bigintValue;
double doubleValue;
bool boolValue;
DateTime dateValue;
// Place checks higher in if-else statement to give higher priority to type.
if (Int32.TryParse(str, out intValue))
return intValue;
else if (Int64.TryParse(str, out bigintValue))
return bigintValue;
else if (double.TryParse(str, out doubleValue))
return doubleValue;
else if (bool.TryParse(str, out boolValue))
return boolValue;
else if (DateTime.TryParse(str, out dateValue))
return dateValue;
else return str;
}
编辑:我只需要满足以下条件:
BIT
DATETIME
INT
NVARCHAR(255)
NVARCHAR(MAX)
BIGINT
DECIMAL(36, 17)
您能看到优先级有任何可能的改善吗?
答案 0 :(得分:14)
我提出了以下有效的解决方案:
enum dataType
{
System_Boolean = 0,
System_Int32 = 1,
System_Int64 = 2,
System_Double = 3,
System_DateTime = 4,
System_String = 5
}
private dataType ParseString(string str)
{
bool boolValue;
Int32 intValue;
Int64 bigintValue;
double doubleValue;
DateTime dateValue;
// Place checks higher in if-else statement to give higher priority to type.
if (bool.TryParse(str, out boolValue))
return dataType.System_Boolean;
else if (Int32.TryParse(str, out intValue))
return dataType.System_Int32;
else if (Int64.TryParse(str, out bigintValue))
return dataType.System_Int64;
else if (double.TryParse(str, out doubleValue))
return dataType.System_Double;
else if (DateTime.TryParse(str, out dateValue))
return dataType.System_DateTime;
else return dataType.System_String;
}
/// <summary>
/// Gets the datatype for the Datacolumn column
/// </summary>
/// <param name="column">Datacolumn to get datatype of</param>
/// <param name="dt">DataTable to get datatype from</param>
/// <param name="colSize">ref value to return size for string type</param>
/// <returns></returns>
public Type GetColumnType(DataColumn column, DataTable dt, ref int colSize)
{
Type T;
DataView dv = new DataView(dt);
//get smallest and largest values
string colName = column.ColumnName;
dv.RowFilter = "[" + colName + "] = MIN([" + colName + "])";
DataTable dtRange = dv.ToTable();
string strMinValue = dtRange.Rows[0][column.ColumnName].ToString();
int minValueLevel = (int)ParseString(strMinValue);
dv.RowFilter = "[" + colName + "] = MAX([" + colName + "])";
dtRange = dv.ToTable();
string strMaxValue = dtRange.Rows[0][column.ColumnName].ToString();
int maxValueLevel = (int)ParseString(strMaxValue);
colSize = strMaxValue.Length;
//get max typelevel of first n to 50 rows
int sampleSize = Math.Max(dt.Rows.Count, 50);
int maxLevel = Math.Max(minValueLevel, maxValueLevel);
for (int i = 0; i < sampleSize; i++)
{
maxLevel = Math.Max((int)ParseString(dt.Rows[i][column].ToString()), maxLevel);
}
string enumCheck = ((dataType)maxLevel).ToString();
T = Type.GetType(enumCheck.Replace('_', '.'));
//if typelevel = int32 check for bit only data & cast to bool
if (maxLevel == 1 && Convert.ToInt32(strMinValue) == 0 && Convert.ToInt32(strMaxValue) == 1)
{
T = Type.GetType("System.Boolean");
}
if (maxLevel != 5) colSize = -1;
return T;
}
答案 1 :(得分:10)
由于Dimi提供了赏金,需要更多的现代化和#34;解决方案,我试着提供一个解决方案。首先,我们需要从合理的类中将字符串转换为不同的东西?
基本类型的合理行为。
尊重文化信息,尤其是在转换数字和日期时。
如有必要,可以使用自定义转换器扩展逻辑。
作为奖励避免长期&#34;如果&#34;链,因为它们非常容易出错。
public class StringConverter {
// delegate for TryParse(string, out T)
public delegate bool TypedConvertDelegate<T>(string value, out T result);
// delegate for TryParse(string, out object)
private delegate bool UntypedConvertDelegate(string value, out object result);
private readonly List<UntypedConvertDelegate> _converters = new List<UntypedConvertDelegate>();
// default converter, lazyly initialized
private static readonly Lazy<StringConverter> _default = new Lazy<StringConverter>(CreateDefault, true);
public static StringConverter Default => _default.Value;
private static StringConverter CreateDefault() {
var d = new StringConverter();
// add reasonable default converters for common .NET types. Don't forget to take culture into account, that's
// important when parsing numbers\dates.
d.AddConverter<bool>(bool.TryParse);
d.AddConverter((string value, out byte result) => byte.TryParse(value, NumberStyles.Integer, d.Culture, out result));
d.AddConverter((string value, out short result) => short.TryParse(value, NumberStyles.Integer, d.Culture, out result));
d.AddConverter((string value, out int result) => int.TryParse(value, NumberStyles.Integer, d.Culture, out result));
d.AddConverter((string value, out long result) => long.TryParse(value, NumberStyles.Integer, d.Culture, out result));
d.AddConverter((string value, out float result) => float.TryParse(value, NumberStyles.Number, d.Culture, out result));
d.AddConverter((string value, out double result) => double.TryParse(value, NumberStyles.Number, d.Culture, out result));
d.AddConverter((string value, out DateTime result) => DateTime.TryParse(value, d.Culture, DateTimeStyles.None, out result));
return d;
}
//
public CultureInfo Culture { get; set; } = CultureInfo.CurrentCulture;
public void AddConverter<T>(Predicate<string> match, Func<string, T> converter) {
// create converter from match predicate and convert function
_converters.Add((string value, out object result) => {
if (match(value)) {
result = converter(value);
return true;
}
result = null;
return false;
});
}
public void AddConverter<T>(Regex match, Func<string, T> converter) {
// create converter from match regex and convert function
_converters.Add((string value, out object result) => {
if (match.IsMatch(value)) {
result = converter(value);
return true;
}
result = null;
return false;
});
}
public void AddConverter<T>(TypedConvertDelegate<T> constructor) {
// create converter from typed TryParse(string, out T) function
_converters.Add(FromTryPattern<T>(constructor));
}
public bool TryConvert(string value, out object result) {
if (this != Default) {
// if this is not a default converter - first try convert with default
if (Default.TryConvert(value, out result))
return true;
}
// then use local converters. Any will return after the first match
object tmp = null;
bool anyMatch = _converters.Any(c => c(value, out tmp));
result = tmp;
return anyMatch;
}
private static UntypedConvertDelegate FromTryPattern<T>(TypedConvertDelegate<T> inner) {
return (string value, out object result) => {
T tmp;
if (inner.Invoke(value, out tmp)) {
result = tmp;
return true;
}
else {
result = null;
return false;
}
};
}
}
像这样使用:
static void Main(string[] args) {
// set culture to invariant
StringConverter.Default.Culture = CultureInfo.InvariantCulture;
// add custom converter to default, it will match strings starting with CUSTOM: and return MyCustomClass
StringConverter.Default.AddConverter(c => c.StartsWith("CUSTOM:"), c => new MyCustomClass(c));
var items = new[] {"1", "4343434343", "3.33", "true", "false", "2014-10-10 22:00:00", "CUSTOM: something"};
foreach (var item in items) {
object result;
if (StringConverter.Default.TryConvert(item, out result)) {
Console.WriteLine(result);
}
}
// create new non-default converter
var localConverter = new StringConverter();
// add custom converter to parse json which matches schema for MySecondCustomClass
localConverter.AddConverter((string value, out MySecondCustomClass result) => TryParseJson(value, @"{'value': {'type': 'string'}}", out result));
{
object result;
// check if that works
if (localConverter.TryConvert("{value: \"Some value\"}", out result)) {
Console.WriteLine(((MySecondCustomClass) result).Value);
}
}
Console.ReadKey();
}
static bool TryParseJson<T>(string json, string rawSchema, out T result) where T : new() {
// we are using Newtonsoft.Json here
var parsedSchema = JsonSchema.Parse(rawSchema);
JObject jObject = JObject.Parse(json);
if (jObject.IsValid(parsedSchema)) {
result = JsonConvert.DeserializeObject<T>(json);
return true;
}
else {
result = default(T);
return false;
}
}
class MyCustomClass {
public MyCustomClass(string value) {
this.Value = value;
}
public string Value { get; private set; }
}
public class MySecondCustomClass {
public string Value { get; set; }
}
答案 2 :(得分:3)
List<Type> types = new List<Type>(new Type[] {
typeof(Boolean)
, typeof(int)
, typeof(double)
, typeof(DateTime)
});
string t = "true";
object retu;
foreach (Type type in types)
{
TypeConverter tc = TypeDescriptor.GetConverter(type);
if (tc != null)
{
try
{
object obj = tc.ConvertFromString(t); // your return value;
}
catch (Exception)
{
continue;
}
}
}
答案 3 :(得分:1)
使用.ToInt16(),. ToInt32(),. ToBool()等将它存储在通用数据类型中会更容易吗?如果你编写一个期望int的应用程序并且它得到布尔值它会失败,所以最好让程序员显式转换为预期的数据类型。
您的方法存在的问题是您不知道包含0作为第一项的行是否包含-100000作为项目编号100.这意味着在所有行都是TryParsed之前,您无法成功转换所有不同的数据类型。操作非常昂贵!
如果有的话,我会使用预编译的正则表达式和/或自定义逻辑来处理数据。例如,迭代所有行以查找最高/最低数字,出现字符串等
答案 4 :(得分:1)
从最狭窄的类型开始,向最广泛的方向努力可能不是最好的方法。如果我对数据一无所知,那么我会从最常出现的类型开始,并以最少的方式努力。如果不知道,如果可能的话,我可能会或可能不会做一些研究来了解可能的统计数据。另外,我只是做出最好的猜测。如果您只希望每10,000条记录发生一次,为什么要提前测试位或日期时间?