我有2个数据框,每行包含文本作为列表。这个叫df
Datum File File_type Text
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..
我还有另一个df_lm,看起来像这样
List_type Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...
我想在df中创建新列,其中应计算单词的匹配,例如,df.Text [0]中df_lm.Words [0]中有多少个单词
注意:df有大约500行,df_lm有6->所以我需要在df中创建6个新列,以便更新的df看起来像这样
Datum ...LM_cnstrain LM_litigius Lm_modal_me ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0
我希望我清楚我的问题。 预先感谢!
编辑: 我已经做完了。通过创建一个列表并在其上循环来实现类似的操作,但是由于df_lm中的列表很长,因此这不是一个选择。
代码如下:
result_list[]
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)
答案 0 :(得分:0)
根据我的评论,您可以尝试以下操作:
以下代码必须循环运行,其中第一个df的文本列必须与下一个的所有6个列匹配,并使用len(c)
中的值来创建列
desc = df_lm.iloc[0,1]
matches = df.text.isin(desc)
result = df.text[matches]
如果这对您有帮助,请告诉我,否则将更新/删除答案
答案 1 :(得分:0)
因此,我提出了以下解决方案:
for file in file_list:
count_lm_constraint = 0
count_lm_litigious = 0
count_lm_modal_me = 0
for word in text.split()
if word in df_lm.iloc[0,1]:
count_lm_constraint = count_lm_constraint +1
if word in df_lm.iloc[1,1]:
count_lm_litigious = count_lm_litigious +1
if word in df_lm.iloc[2,1]:
count_lm_modal_me = count_lm_modal_me +1
a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
result_list.append(a)