我正在尝试过滤掉推文,但是卡住了。我想拥有一个包含所有英语单词以及我添加的单词的字典,并希望借此过滤掉我的推文。因为我有一个包含推文的数据框,并且文本列为:
text
1 | @a_siab @sardarbabak999 @BushraGohar @jafarshahmp @Palwasha_Abbas @Khadimhussain4 @Khushal_Khattak @SPOX_ANP @mjdawar @AsgharAchakzaii @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan
2 | @KPKUpdates @ImranKhanPTI @jhagra @Shah_FarmanPTI @AsadQaiserPTI @MushtaqGhaniPTI @PervezKhattakCM @ziataj @AtifKhanpti It's much better than anp and mma time
3 | Easy load was started by PTI and not ANP #KaptaanFailedInKP
4 | @Palwasha_Abbas @Gulalai_Ismail This was much needed and people are happy with using it which avoided traffic issues. Your only issue is PTI buried ANP easyload shops forever now it’s obvious you will cry
5 | @x_anp <U+304A><U+3081><U+3067><U+3068><U+3046>!!
6 | Tourism & Poor Condition of Swat Roads, Part-2 #Swat #Tourism #Kpk #CareTakerPM #NasarullMuik #MaryamNawaz #PMLN #PTI #ImranKhan #Newsonepk #ANP #Nadia @nadia_a_mirza #Pakistan
7 | @Palwasha_Abbas Kuch samjhayein in ko... Articles likhnay say character thek nhe hotay... ANP is history. @BushraGohar
8 | <U+062F> <U+06A9><U+0644><U+064A> <U+0631><U+0648><U+063A> <U+062A><U+0631><U+06CC><U+0646><U+0647> <U+0686><U+0627><U+067E><U+06D0><U+0631><U+0647> <U+062F><U+064A> <U+0627><U+0648> <U+062E><U+0627><U+0646><U+062F><U+064A> <U+067E><U+0633><U+06D0> <U+06CC><U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand
9 | <U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP is history
10 | @MianIftikharHus Pti ki govt bilkul bhi ideal nhi ti magar mazrat k sath anp or mma ki pechli govt se pti ki govt kafi behtr ti .hospitals or schools me tabdeeli ayi hy koi mane ya na mane.
11 | <U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP ki koi history nahi hai
所需的数据框应类似于:
text
1 | @a_siab @sardarbabak999 @BushraGohar @jafarshahmp @Palwasha_Abbas @Khadimhussain4 @Khushal_Khattak @SPOX_ANP @mjdawar @AsgharAchakzaii @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan
2 | @KPKUpdates @ImranKhanPTI @jhagra @Shah_FarmanPTI @AsadQaiserPTI @MushtaqGhaniPTI @PervezKhattakCM @ziataj @AtifKhanpti It's much better than anp and mma time
3 | Easy load was started by PTI and not ANP #KaptaanFailedInKP
4 | @Palwasha_Abbas @Gulalai_Ismail This was much needed and people are happy with using it which avoided traffic issues. Your only issue is PTI buried ANP easyload shops forever now it’s obvious you will cry
5 | NA
6 | Tourism & Poor Condition of Swat Roads, Part-2 #Swat #Tourism #Kpk #CareTakerPM #NasarullMuik #MaryamNawaz #PMLN #PTI #ImranKhan #Newsonepk #ANP #Nadia @nadia_a_mirza #Pakistan
7 | NA
8 | <U+062F> <U+06A9><U+0644><U+064A> <U+0631><U+0648><U+063A> <U+062A><U+0631><U+06CC><U+0646><U+0647> <U+0686><U+0627><U+067E><U+06D0><U+0631><U+0647> <U+062F><U+064A> <U+0627><U+0648> <U+062E><U+0627><U+0646><U+062F><U+064A> <U+067E><U+0633><U+06D0> <U+06CC><U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand
9 | <U+0648> <U+0644><U+06CC><U+0648><U+0646><U+06CC> <U+0648><U+0631><U+062A><U+0647> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+0631><U+069A><U+062A><U+06CC><U+0627> <U+062E><U+0628><U+0631><U+06D0> <U+06A9><U+0648><U+064A> @AbdullahMalikJ @a_baittani @SherazMmd @MianIftikharHus @anp_mohmand @MisbahUtmani @takkar1234 @anp_mohmand @AmirHaiderKH @MianIftikharHus @khaistak50 @SangeenKhan ANP is history
10 | NA
11 | NA
然后,我可以通过将数据框转换为语料库来轻松删除。 我只想要这个。如何获得这样的字典?使用字典可以吗?还是我应该使用分类器或其他东西。请解释您的答案我该怎么办?感谢帮助!