我有一个字符串,如下所示:
public class Geometria
{
public int Id { get; set; }
public string Componente { get; set; }
[Required]
[Range(0, float.MaxValue)]
[Display(Name = "Tolerância Inferior")]
public float ToleranciaInferior { get; set; }
[Required]
[Range(0, float.MaxValue)]
[ToleranciaLessOrEqualThan("ToleranciaInferior", ErrorMessage = "Tolerância Superior tem de ser maior que a Tolerância Inferior")]
[Display(Name = "Tolerância Superior")]
public float ToleranciaSuperior { get; set; }
}
public class ToleranciaLessOrEqualThanAttribute : ValidationAttribute, IClientModelValidator
{
private readonly string _comparisonProperty;
public ToleranciaLessOrEqualThanAttribute(string comparisonProperty)
{
_comparisonProperty = comparisonProperty;
}
protected override ValidationResult IsValid(object value, ValidationContext validationContext)
{
ErrorMessage = ErrorMessageString;
var currentValue = (float)value;
var property = validationContext.ObjectType.GetProperty(_comparisonProperty);
if (property == null)
throw new ArgumentException("Property with this name not found");
var comparisonValue = (float)property.GetValue(validationContext.ObjectInstance);
if (currentValue <= comparisonValue)
return new ValidationResult(ErrorMessage);
return ValidationResult.Success;
}
public void AddValidation(ClientModelValidationContext context)
{
if (context == null)
{
throw new ArgumentNullException(nameof(context));
}
var error = FormatErrorMessage(context.ModelMetadata.GetDisplayName());
context.Attributes.Add("data-val", "true");
context.Attributes.Add("data-val-tolerancialessorequalthan", error);
}
}
将其转换为具有以下格式的熊猫数据框的最简单方法是什么?我正在寻找创建具有5列的数据框,第一列的标题可以用“实体”填充。第一列包含实体的名称。
答案 0 :(得分:5)
您可以尝试以下方法:
import pandas as pd
s1 = "entity precision recall f1-score support B-EXPERIENCE 0.578 0.488 0.529 244 I-EXPERIENCE 0.648 0.799 0.716 399 L-EXPERIENCE 0.850 0.697 0.766 244 U-EXPERIENCE 0.000 0.000 0.000 9 B-LANGUAGE 0.000 0.000 0.000 1 I-LANGUAGE 0.000 0.000 0.000 1 L-LANGUAGE 0.000 0.000 0.000 1 U-LANGUAGE 0.788 0.904 0.842 292 B-PROGRAMMING 0.480 0.433 0.455 141 I-PROGRAMMING 0.524 0.328 0.404 67 L-PROGRAMMING 0.261 0.255 0.258 141 U-PROGRAMMING 0.904 0.825 0.862 2010 micro_avg 0.785 0.746 0.765 3550 macro_avg 0.419 0.394 0.403 3550 weighted_avg 0.787 0.746 0.763 3550"
s = pd.Series(s1.split(' '))
df = pd.DataFrame(s[5:].to_numpy().reshape(-1,5), columns=s[:5])
输出:
entity precision recall f1-score support
0 B-EXPERIENCE 0.578 0.488 0.529 244
1 I-EXPERIENCE 0.648 0.799 0.716 399
2 L-EXPERIENCE 0.850 0.697 0.766 244
3 U-EXPERIENCE 0.000 0.000 0.000 9
4 B-LANGUAGE 0.000 0.000 0.000 1
5 I-LANGUAGE 0.000 0.000 0.000 1
6 L-LANGUAGE 0.000 0.000 0.000 1
7 U-LANGUAGE 0.788 0.904 0.842 292
8 B-PROGRAMMING 0.480 0.433 0.455 141
9 I-PROGRAMMING 0.524 0.328 0.404 67
10 L-PROGRAMMING 0.261 0.255 0.258 141
11 U-PROGRAMMING 0.904 0.825 0.862 2010
12 micro_avg 0.785 0.746 0.765 3550
13 macro_avg 0.419 0.394 0.403 3550
14 weighted_avg 0.787 0.746 0.763 3550
详细信息:
使用split
使用空格作为定界符来分割字符串,因此要求更改列标题的命名以从列标题中删除空格。
使用构造函数创建pd.Series,然后使用构造函数和索引切片创建pd.DataFrame。 to_numpy
创建一个numpy数组,然后reshape
使用-1表示行数,5表示列数。
答案 1 :(得分:2)
我将使用numpy重塑:
data = np.array(string.split())
data = data.reshape(len(data)//5, 5)
df = pd.DataFrame(data[1:], columns=data[0]).set_index('entity').rename_axis('')
print(df)
给予:
precision recall f1-score support
B-EXPERIENCE 0.578 0.488 0.529 244
I-EXPERIENCE 0.648 0.799 0.716 399
L-EXPERIENCE 0.850 0.697 0.766 244
U-EXPERIENCE 0.000 0.000 0.000 9
B-LANGUAGE 0.000 0.000 0.000 1
I-LANGUAGE 0.000 0.000 0.000 1
L-LANGUAGE 0.000 0.000 0.000 1
U-LANGUAGE 0.788 0.904 0.842 292
B-PROGRAMMING 0.480 0.433 0.455 141
I-PROGRAMMING 0.524 0.328 0.404 67
L-PROGRAMMING 0.261 0.255 0.258 141
U-PROGRAMMING 0.904 0.825 0.862 2010
micro_avg 0.785 0.746 0.765 3550
macro_avg 0.419 0.394 0.403 3550
weighted_avg 0.787 0.746 0.763 3550
答案 2 :(得分:1)
如果您要调整最后三个条目中的字符串并删除空格(例如,用短划线代替),则以下代码将起作用,并且还可以扩展到更多行:
my_list = string.split(' ') # split the string along the whitespaces
my_dict = {}
num_cols = 5
# convert the string to a dictionary with appropriate keys
for i in range(0,num_cols):
my_dict.update({my_list[i]:my_list[num_cols+i::num_cols]})
# Convert dict to pandas DataFrame
df = pd.DataFrame(my_dict)
>> pd.DataFrame(df)
entity precision recall f1-score support
0 B-EXPERIENCE 0.578 0.488 0.529 244
1 I-EXPERIENCE 0.648 0.799 0.716 399
2 L-EXPERIENCE 0.850 0.697 0.766 244
3 U-EXPERIENCE 0.000 0.000 0.000 9
4 B-LANGUAGE 0.000 0.000 0.000 1
5 I-LANGUAGE 0.000 0.000 0.000 1
6 L-LANGUAGE 0.000 0.000 0.000 1
7 U-LANGUAGE 0.788 0.904 0.842 292
8 B-PROGRAMMING 0.480 0.433 0.455 141
9 I-PROGRAMMING 0.524 0.328 0.404 67
10 L-PROGRAMMING 0.261 0.255 0.258 141
11 U-PROGRAMMING 0.904 0.825 0.862 2010
12 micro-avg 0.785 0.746 0.765 3550
13 macro-avg 0.419 0.394 0.403 3550
14 weighted-avg 0.787 0.746 0.763 3550
答案 3 :(得分:1)
另一种方法是用yield
将字符串分成5个均匀的列表,这会返回到上一次迭代时剩下的状态:
cols = string.split()[:5]
vals = string.split()[5:]
# Define function to make evenly chunks of your words
def divide_chunks(l, n):
for i in range(0, len(l), n):
yield l[i:i + n]
现在我们可以定义数据框:
df = pd.DataFrame(list(divide_chunks(vals, 5)), columns=cols)
输出:
entity precision recall f1-score support
0 B-EXPERIENCE 0.578 0.488 0.529 244
1 I-EXPERIENCE 0.648 0.799 0.716 399
2 L-EXPERIENCE 0.850 0.697 0.766 244
3 U-EXPERIENCE 0.000 0.000 0.000 9
4 B-LANGUAGE 0.000 0.000 0.000 1
5 I-LANGUAGE 0.000 0.000 0.000 1
6 L-LANGUAGE 0.000 0.000 0.000 1
7 U-LANGUAGE 0.788 0.904 0.842 292
8 B-PROGRAMMING 0.480 0.433 0.455 141
9 I-PROGRAMMING 0.524 0.328 0.404 67
10 L-PROGRAMMING 0.261 0.255 0.258 141
11 U-PROGRAMMING 0.904 0.825 0.862 2010
12 micro_avg 0.785 0.746 0.765 3550
13 macro_avg 0.419 0.394 0.403 3550
14 weighted_avg 0.787 0.746 0.763 3550