基于参考数据帧从多个数据帧收集信息

时间:2018-06-01 06:59:13

标签: python pandas

您好我正在尝试构造句子,这些词来自前3个数据框

df1=pd.DataFrame()
    df1['w']=['i', 'am', 'python', 'is', 'set', 'sail']
    df1['n'] = [1,2,3,4,5,6]
    df2 =pd.DataFrame()
    df2['w']=['i', 'wish', 'in', 'love', 'has' ]
    df2['n'] =[1,2,3,4,5]
    df3 = pd.DataFrame()
    df3['w']=['the', 'ship', 'with', 'you', 'my', 'friend']
    df3['n']=[1,2,3,4,5,6]

这里定义了部分,以及每个句子和边界的单词的位置

string= pd.DataFrame()
string['location'] = ['df1', 'df2', 'df3', 'df2', 'df1', 'df3', 'df3', 'df2', 'df1']
string['start'] = [1, 3, 3, 1, 3, 5, 1, 5, 5]
string['stop'] = [2 , 4, 4, 1, 4, 6, 2, 5, 6]
string['sentence] = [1,1,1,2,2,2,3,3,3]
string['part'] = [1, 1, 1, 1, 1, 1, 2, 2, 2]

所需的输出是

i am in love with you 
i wish pyhton is my fried
**boundry**
the ship has set sail
**boundry**

我试过的代码是,我已经发现这远远似乎做了我想要的大部分但是我想知道如何使用多个表来处理Mae并得到我之后的命令。

x = df1.set_index('n')['w']
sent = [
    ' '.join(x.loc[i:j]) for i, j in zip(string['start'], string['stop'])
]

sent

我得到的输出是

['i am',
 'python is',
 'python is',
 'i',
 'python is',
 'set sail',
 'i am',
 'set',
 'set sail']

1 个答案:

答案 0 :(得分:0)

试试这个,

string['res'] = string.apply(lambda x: ' '.join(globals()[x['location']].iloc[(x['start']-1):(x['stop'])]['w']),axis=1)
print string['res'].values.tolist()

输出:

['i am', 'in love', 'with you', 'i wish', 'python is', 'my friend', 'the ship', 'has', 'set sail']

进一步的结果(添加边界):

string['res'] = string.apply(lambda x: ' '.join(globals()[x['location']].iloc[(x['start']-1):(x['stop'])]['w']),axis=1)
string.loc[~string['part'].duplicated(keep='last'),'flag']='boundry'

l=list(string['res'].values)
b = list(np.where(string['flag'].values == 'boundry')[0])
[l.insert(ind+i,'boundary') for i,ind in enumerate(b,1)]
print l

输出:

['i am', 'in love', 'with you', 'i wish', 'python is', 'my friend', 'boundary', 'the ship', 'has', 'set sail', 'boundary']