Question

让我们说我做了一个实验，我画了两种颜色的弹珠，结果就像

'Experiment Draw1,Draw2'    
ie: 'Trail1 Yellow-Green'

所以我将结果插入到Dataframe中并希望获得3列（实验，第一次绘制，第二次绘制）如何有效地分割它以便我可以将结果绘制到结果Dataframe上作为数字（即）

import pandas as pd

df=pd.DataFrame({'Data': ['Trail1 Yellow-Green','Sample1 Gold-Blue', 'Sample2 Silver-Gold', 'Test2 Gold-Yellow', 'Test Red-Blue'],})

df2 = df['Data'].apply(lambda x: pd.Series(x.split(' ')))
df3 = df2[1].apply(lambda x: pd.DataFrame(x.split('-')))

axis1=['Red','Orange', 'Yellow', 'Green', 'Blue', 'Gold', 'Silver']
axis2=['Red','Orange', 'Yellow', 'Green', 'Blue', 'Gold', 'Silver']

results=pd.DataFrame(index=axis1, columns=axis2)

将术语添加到数据框中的最佳方法是使用for循环和某些代码，例如：

results.ix[df3.loc['Red'], 'Blue'] = 'Y'

#For numerical values

results.ix[df3.loc['Red'], 'Blue'] = 1

Answer 1

您可以使用str.extract方法：

In [11]: s = df.Data

In [12]: res = s.str.extract("(?P<experiment>.*?) (?P<first>.*?)-(?P<second>.*)")

In [13]: res
Out[13]: 
  experiment   first  second
0     Trail1  Yellow   Green
1    Sample1    Gold    Blue
2    Sample2  Silver    Gold
3      Test2    Gold  Yellow
4       Test     Red    Blue

然后我认为你正在寻找一个pivot_table：

In [14]: res.pivot_table(values='experiment', cols='first', rows='second', 
                         aggfunc=len, fill_value=0)
Out[14]: 
first   Gold  Red  Silver  Yellow
second                           
Blue       1    1       0       0
Gold       0    0       1       0
Green      0    0       0       1
Yellow     1    0       0       0

要重新索引具有相同的行和列，我认为你必须重新索引：

In [15]: _.reindex(axis1).reindex_axis(axis1, 1).fillna(0)

Pandas从文本中分割字符串然后输入到数据帧中

1 个答案: