寻找如何解决问题的想法:
我有一个数据框,其中一列包含以下元组列表:
mydf = pd.DataFrame({
'Field1' : ['A','B','C'],
'Field2' : ['1','2','3'],
'WeirdField' :[
[ ('xxx', 'F1'), ('yyy','F2') ],
[ ('asd', 'F3'), ('bla','F4') ],
[ ('123', 'F2'), ('www','F5') ]
]
})
我希望元组的第二个位置上的每个元素成为数据框上的一列,并在第一个位置上具有相应的值。 对于上面的数据框,这是我所期待的:
列表可以包含多个元素(不仅仅是2个元素),元素数量可以在各行之间变化。
任何人都可以建议如何轻松实现这一目标吗?
由于
答案 0 :(得分:4)
首先,我将mydf['WeirdField']
列展平,以便我们只能看到值和列名称,而不必担心它们所包含的列表。接下来,您可以使用itertools.groupby
获取每个" F"的所有相应值和索引。列。
import itertools
# Must first sort the list by F column, or groupby won't work
flatter = sorted([list(x) + [idx] for idx, y in enumerate(mydf['WeirdField'])
for x in y], key = lambda x: x[1])
# Find all of the values that will eventually go in each F column
for key, group in itertools.groupby(flatter, lambda x: x[1]):
list_of_vals = [(val, idx) for val, _, idx in group]
# Add each value at the appropriate index and F column
for val, idx in list_of_vals:
mydf.loc[idx, key] = val
产生这个:
In [84]: mydf
Out[84]:
Field1 Field2 WeirdField F1 F2 F3 F4 F5
0 A 1 [(xxx, F1), (yyy, F2)] xxx yyy NaN NaN NaN
1 B 2 [(asd, F3), (bla, F4)] NaN NaN asd bla NaN
2 C 3 [(123, F2), (www, F5)] NaN 123 NaN NaN www
答案 1 :(得分:1)
std::string
产量
import pandas as pd
mydf = pd.DataFrame({
'Field1' : ['A','B','C'],
'Field2' : ['1','2','3'],
'WeirdField' :[
[ ('xxx', 'F1'), ('yyy','F2'),('xyz','F6') ],
[ ('asd', 'F3'), ('bla','F4') ],
[ ('123', 'F2'), ('www','F5') ,('mno','F1') ]
]
})
print mydf.head()
# Create a new data frame with just field1 and field2
newdf = pd.DataFrame({'Field1' : ['A','B','C'],
'Field2' : ['1','2','3'],
})
# create a list of columns
column_names = []
for index, row in mydf.iterrows():
for j in range( len(mydf['WeirdField'][index])):
column_names.append( mydf['WeirdField'][index][j][1])
# Create a unique set of columns names
new_column_names = list(set(column_names))
# Add list of columns to the new dataframe and populate with None
for i,j in enumerate(new_column_names):
newdf.insert(i+2,j,None)
# now add the elements into the columns
for index, row in mydf.iterrows():
for j in range( len(mydf['WeirdField'][index])):
newdf.set_value(index, [mydf['WeirdField'][index][j][1]], mydf['WeirdField'][index][j][0])
print newdf.head()
答案 2 :(得分:1)
在压缩列值后考虑pivot_table
解决方案。这将在 WeirdField 中的任意数量的元组中起作用,假设F中没有一个与pivot相同的行重复将采用最大值:
data =[]
# APPEND TO LIST
for f1,f2,w in zip(mydf['Field1'].values, mydf['Field2'].values, mydf['WeirdField'].values):
for i in w:
data.append((f1, f2) + i)
# CAST LIST OF TUPLES TO DATAFRAME
df = pd.DataFrame(data, columns=['Field1', 'Field2', 'Value', 'Indicator'])
# PIVOT DATAFRAME
pvt = df.pivot_table(index=['Field1', 'Field2'], columns=['Indicator'],
values='Value', aggfunc='max', fill_value=np.nan).reset_index()
pvt.columns.name = None
# Field1 Field2 F1 F2 F3 F4 F5
# 0 A 1 xxx yyy NaN NaN NaN
# 1 B 2 NaN NaN asd bla NaN
# 2 C 3 NaN 123 NaN NaN www