在Json中转换DataFrame,添加列名称,如所需的输出技能和建议后,保存在MongoDB集合中
0 1 2 3 4 5 6 7
java hadoop java hdfs c c++ php python html
c c c++ hdfs python hadoop java php html
c++ c++ c python hdfs hadoop java php html
hadoop hadoop java hdfs c c++ php python html
hdfs hdfs hadoop java c c++ python php html
python python c++ html c php hdfs hadoop java
{" _id" :ObjectId(" 5922a781205a763b55e2e90e"),"技能" :" java","建议" :[" hadoop"," java"," hdfs"," c"," c ++", " php"," python"," html" ]}
{" _id" :ObjectId(" 5922a781205a763b55e2e91e"),"技能" :" c","建议" :[" c"," c ++"," hdfs"," python"," hadoop", " java"," php"," html" ]}
{" _id" :ObjectId(" 5922a781205a763b55e2e92e"),"技能" :" c ++","建议" :[" c ++"," c"," python"," hdfs"," hadoop", " java"," php"," html" ]}
{" _id" :ObjectId(" 5922a781205a763b55e2e93e"),"技能" :" hadoop","建议" :[" hadoop"," java"," hdfs"," c"," c ++", " php"," python"," html" ]}
答案 0 :(得分:1)
首先,您需要将数据转换为相应的格式。
strlist = [['java','hadoop','java','hdfs','c','c++','php','python','html'],
['c','c','c++','hdfs','python','hadoop','java','php','html'],
['c++','c++','c','python','hdfs','hadoop','java','php','html'],
['hadoop','hadoop','java','hdfs','c','c++','php','python','html'],
['hdfs','hdfs','hadoop','java','c','c++','python','php','html'],
['python','python','c++','html','c','php','hdfs','hadoop','java']]
df = pd.DataFrame(strlist)
#I guess you need the following code
df['skill']=df[df.columns[:1]].values
df['suggestions'] = df[df.columns[1:]].values.tolist()
df = df[['skill','suggestions']]
print(df)
skill suggestions
0 java [hadoop, java, hdfs, c, c++, php, python, html...
1 c [c, c++, hdfs, python, hadoop, java, php, html...
2 c++ [c++, c, python, hdfs, hadoop, java, php, html...
3 hadoop [hadoop, java, hdfs, c, c++, php, python, html...
4 hdfs [hdfs, hadoop, java, c, c++, python, php, html...
5 python [python, c++, html, c, php, hdfs, hadoop, java...
然后将数据帧插入mongdb数据库。
records = json.loads(df.T.to_json()).values()
collection.insert_many(records)