我是一名熊猫新手并编写了一些代码,应该将字典附加到连续的最后一列。 最后一列名为“Holder”
我的部分代码冒犯了pandas引擎,如下所示
df.loc[df[innercat] == -1, 'Holder'] += str(odata)
我收到错误消息
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S75') dtype('S75') dtype('S75')
当我运行代码将“+ =”替换为“=”时,代码运行得很好,尽管我只获取了我想要的部分数据。 我究竟做错了什么?我已经尝试删除str()强制转换,它仍可用作赋值,而不是追加。
进一步澄清:
Math1 Math1_Notes Physics1 Physics1_Notes Chem1 Chem1_Notes Bio1 Bio1_Notes French1 French1_Notes Spanish1 Spanish1_Notes Holder
-1 Gr8 student 0 0 0 0 -1 Foo NaN
0 0 0 0 0 -1 Good student NaN
0 0 -1 So so 0 0 0 NaN
0 -1 Not serious -1 Hooray -1 Voila 0 NaN
我的原始数据集包含超过300列数据,但我创建了一个示例,捕捉了我正在尝试做的精神。想象一下,一所拥有300个部门的学院,每个部门提供1门(或更多)课程。以上数据是该数据的微观样本。因此,对于每个学生,在他们的姓名或入学号码旁边,有一个“-1”表示他们参加了某个课程。此外,下一栏USUALLY包含该部门关于该学生的说明。
查看上面数据的第一行,我们有一名学生参加了数学与实践。西班牙语和每个部门都添加了一些关于学生的评论对于每一行,我想添加一个dict,总结每个学生的数据。基本上是每个部门条目的JSON摘要。假设一般形式的字符串
json_string = {"student name": a, "data": {"notes": b, "Course name": c}}
我打算让我的代码读取我的csv,为每个部门形成一个字典并将其附加到Holder列。因此,对于上述学生(第1行),将有2个词,即
{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}
{"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
并且第1行的Holder的最终内容将是
{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
当我可以成功追加数据时,我可能会添加一个逗号或'|'在单独的dicts之间。我写的代码行是
df.loc[df[innercat] == -1, 'Holder'] = str(odata)
我是否将上面的行转换为str(),编写赋值而不是append运算符似乎覆盖了所有以前的值,只将最后一个值写入Holder,如
-1 Gr8 student 0 0 0 0 -1 Foo {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
虽然我想要
-1 Gr8 student 0 0 0 0 -1 Foo {"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
对于有兴趣复制我所做的事情的人,我的代码的主要部分如下所示
count = 0
substrategy = 0
for cat in col_array:
count += 1
for innercat in cat:
if "Notes" in innercat:
#b = str(df[innercat])
continue
substrategy += 1
c = count
a = substrategy
odata = {}
odata['did'] = a
odata['id'] = a
odata['data'] = {}
odata['data']['notes'] = b
odata['data']['substrategy'] = a
odata['data']['strategy'] = c
df.loc[df[innercat] == -1, 'Holder'] += str(odata)
答案 0 :(得分:1)
是你想要的吗?
In [190]: d1 = {"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}
In [191]: d2 = {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
In [192]: import json
In [193]: json.dumps(d1)
Out[193]: '{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}'
In [194]: df
Out[194]:
Investments_Cash Holder
0 0 NaN
1 0 NaN
2 -1 NaN
In [196]: df.Holder = ''
In [197]: df.ix[df.Investments_Cash == -1, 'Holder'] += json.dumps(d1)
In [198]: df.ix[df.Investments_Cash == -1, 'Holder'] += ' ' + json.dumps(d2)
In [199]: df
Out[199]:
Investments_Cash
Holder
0 0
1 0
2 -1 {"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course nam...
注意:将来工作/解析Holder
列会非常痛苦,因为它不是标准的 - 如果没有额外的预处理(例如使用拆分),您将无法解析它复杂的RegEx'es等。)
所以我强烈建议您将一个dicts列表转换为JSON - 您将能够使用json.loads()方法将其读回:
In [201]: df.ix[df.Investments_Cash == -1, 'Holder'] = json.dumps([d1, d2])
In [202]: df
Out[202]:
Investments_Cash
Holder
0 0
1 0
2 -1 [{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}, {"student name": "Peter", "data": {"notes": "Foo", "Course n...
解析回来:
In [204]: lst = json.loads(df.ix[2, 'Holder'])
In [205]: lst
Out[205]:
[{'data': {'Course name': 'Math1', 'notes': 'Gr8 student'},
'student name': 'Peter'},
{'data': {'Course name': 'Spanish1', 'notes': 'Foo'},
'student name': 'Peter'}]
In [206]: lst[0]
Out[206]:
{'data': {'Course name': 'Math1', 'notes': 'Gr8 student'},
'student name': 'Peter'}
In [207]: lst[1]
Out[207]: {'data': {'Course name': 'Spanish1', 'notes': 'Foo'}, 'student name': 'Peter'}