如何用另一个数据框中的3个值替换每个单词

时间:2019-05-06 08:26:23

标签: python pandas dataframe

令人困惑的标题,让我解释一下。我有两个数据帧df1df2

df1看起来像这样:

id `  text                                 
1     Hello world how are you people     
2     Hello people I am fine  people    
3     Good Morning people               
4     Good Evening                      

df2看起来像这样

Word      count         Points         Percentage

hello        2             2              100
world        1             1              100
how          1             1              100
are          1             1              100
you          1             1              100
people       3             1              33.33
I            1             1              100
am           1             1              100
fine         1             1              100
Good         2             -2            -100
Morning      1             -1            -100
Evening      1             -1            -100

df2包含每个单词df1一次,并且给它们三个值count points percentage

首先,我想将df1的每个单词替换为count points percentage。例如:第一行

Hello world how are you people将成为此2 2 100 1 1 100 1 1 100 1 1 100 1 1 100 1 1 100

因为Hello = 2 2 100世界= 1 1 100就这样。

预期输出::

id `  text                                 
1      2 100 1 1 100 1 1 100 1 1 100 1 1 100 1 1 100
2     2 2 100 3 1 33.33 1 1 100 1 1 100 1 1 100 3 1 33.33 
3     2 -2 -100 1 -1 -100 3 1 33.33            
4     2 -2 -100 1 -1 -100  

2 个答案:

答案 0 :(得分:2)

首先通过join连接alll值,然后将apply与对转换为小写的映射值的理解一起使用:

s = (df2.assign(Word=df2['Word'].str.lower())
       .set_index('Word')[["count","Points","Percentage"]]
       .astype(str)
       .apply(' '.join, axis=1))


df1['text'] = df1['text'].str.lower().apply(lambda x: ' '.join(s.get(y) for y in x.split()))
print (df1)
   id                                               text
0   1  2 2 100.0 1 1 100.0 1 1 100.0 1 1 100.0 1 1 10...
1   2  2 2 100.0 3 1 33.33 1 1 100.0 1 1 100.0 1 1 10...
2   3                  2 -2 -100.0 1 -1 -100.0 3 1 33.33
3   4                            2 -2 -100.0 1 -1 -100.0

答案 1 :(得分:0)

也许是这样。

希望对您有帮助。

我只是做第一句话。

df1=pd.DataFrame(columns=["text"],data=["Hello world how are you people"])
df2=pd.DataFrame(columns=["Word","count","Points","Percentage"],
                 data=[
                     ["hello",2,2,100],
                     ["world",1,1,100],
                      ["how",1,1,100],
                      ["are",1,1,100],
                      ["you",1,1,100],
                      ["people",3,1,33.33],
                 ])

对于df1中每个单词的每个句子,从df2中选择“计数”,“点”,“百分比”并将其添加到字符串中。

for i,row in df1.iterrows():
    new_string=""

    for word in row["text"].split(" "):
        values_from_df2=list(df2.loc[df2["Word"]==word.lower()][["count","Points","Percentage"]].values[0])
        new_string += ' '.join(str(int(e)) for e in values_from_df2)+" "

    row["text"] = new_string

结果:

    text
0   2 2 100 1 1 100 1 1 100 1 1 100 1 1 100 3 1 33