令人困惑的标题,让我解释一下。我有两个数据帧df1
和df2
:
df1
看起来像这样:
id ` text
1 Hello world how are you people
2 Hello people I am fine people
3 Good Morning people
4 Good Evening
df2
看起来像这样
Word count Points Percentage
hello 2 2 100
world 1 1 100
how 1 1 100
are 1 1 100
you 1 1 100
people 3 1 33.33
I 1 1 100
am 1 1 100
fine 1 1 100
Good 2 -2 -100
Morning 1 -1 -100
Evening 1 -1 -100
df2
包含每个单词df1
一次,并且给它们三个值count
points
percentage
。
首先,我想将df1
的每个单词替换为count
points
percentage
。例如:第一行
Hello world how are you people
将成为此2 2 100 1 1 100 1 1 100 1 1 100 1 1 100 1 1 100
因为Hello
= 2 2 100
世界= 1 1 100
就这样。
预期输出::
id ` text
1 2 100 1 1 100 1 1 100 1 1 100 1 1 100 1 1 100
2 2 2 100 3 1 33.33 1 1 100 1 1 100 1 1 100 3 1 33.33
3 2 -2 -100 1 -1 -100 3 1 33.33
4 2 -2 -100 1 -1 -100
答案 0 :(得分:2)
首先通过join
连接alll值,然后将apply
与对转换为小写的映射值的理解一起使用:
s = (df2.assign(Word=df2['Word'].str.lower())
.set_index('Word')[["count","Points","Percentage"]]
.astype(str)
.apply(' '.join, axis=1))
df1['text'] = df1['text'].str.lower().apply(lambda x: ' '.join(s.get(y) for y in x.split()))
print (df1)
id text
0 1 2 2 100.0 1 1 100.0 1 1 100.0 1 1 100.0 1 1 10...
1 2 2 2 100.0 3 1 33.33 1 1 100.0 1 1 100.0 1 1 10...
2 3 2 -2 -100.0 1 -1 -100.0 3 1 33.33
3 4 2 -2 -100.0 1 -1 -100.0
答案 1 :(得分:0)
也许是这样。
希望对您有帮助。
我只是做第一句话。
df1=pd.DataFrame(columns=["text"],data=["Hello world how are you people"])
df2=pd.DataFrame(columns=["Word","count","Points","Percentage"],
data=[
["hello",2,2,100],
["world",1,1,100],
["how",1,1,100],
["are",1,1,100],
["you",1,1,100],
["people",3,1,33.33],
])
对于df1中每个单词的每个句子,从df2中选择“计数”,“点”,“百分比”并将其添加到字符串中。
for i,row in df1.iterrows():
new_string=""
for word in row["text"].split(" "):
values_from_df2=list(df2.loc[df2["Word"]==word.lower()][["count","Points","Percentage"]].values[0])
new_string += ' '.join(str(int(e)) for e in values_from_df2)+" "
row["text"] = new_string
结果:
text
0 2 2 100 1 1 100 1 1 100 1 1 100 1 1 100 3 1 33