我正在对一些字幕进行分析,我已设法清理并计算频率。现在我想删除所有的停用词(随“tm”包一起提供)。
以下是数据示例:
words2 <- c("a", "be", "am", "you", "lannister", "wolf", "angry", "scandals", "should", "me")
frequency2 <- c(12,10,15, 20, 5, 10,8,3,9,20)
stopwordslst <- c("i","me","my","myself","we","our","ours","ourselves","you","your","yours","yourself","it","its","they","them","thei","theirs","themselves", "what",
"those","am","is","are","be","been","being","have","has","does","did","doing","would","should")
所以我尝试制作一个for循环,想法是制作逻辑然后删除所有真实的。但我无法找到正确的方法,因此它将错误保存在data.frame中的相同结构中。
以下是我的尝试:
for(i in words){
if(i == stopwordslst[]){
(data1[-i,])
}
}
预期结果与数据帧相同但是像这样:
words frequency
lannister 5
wolf 10
angry 8
scandals 3
提前致谢
答案 0 :(得分:0)
迭代删除stopwordlst
中出现的df = data.frame(words=words2,frequency=frequency2)
df = df[(sapply(c(1:nrow(df)),FUN = function(x){sum(df$words[x]==stopwordslst)})==0),]
> df
words frequency
5 lannister 5
6 wolf 10
7 angry 8
8 scandals 3
字样对我有效。
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:id="@+id/layout_root"
android:orientation="vertical"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
android:padding="10dp"
>
<RelativeLayout
android:orientation="vertical"
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:paddingTop="3dip" >
<ImageView android:id="@+id/close"
android:layout_width="30dip"
android:layout_height="30dip"
android:layout_alignParentRight="true"
android:layout_marginRight="3dp"
android:src="@drawable/ic_cancel_black_24dp"
/>
<TextView android:id="@+id/text1"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_centerHorizontal="true"
android:layout_alignParentLeft="true"
android:layout_marginLeft="3dp"
android:textColor="#FFF"
android:textSize="20dip"
android:text="Choose Categories"/>
</RelativeLayout>
<GridView
android:id="@+id/gridview"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
android:columnWidth="90dp"
android:numColumns="3"
android:verticalSpacing="10dp"
android:horizontalSpacing="10dp"
android:stretchMode="columnWidth"
android:gravity="center"/>
答案 1 :(得分:0)
正如@Sotos所提到的,您可以使用%in%
和!
来获取要包含的字词,并使用相同的索引来选择频率。
df <- data.frame(words = words2[!words2 %in% stopwordslst],
frequency = frequency2[!words2 %in% stopwordslst])
df
# words frequency
#1 a 12
#2 lannister 5
#3 wolf 10
#4 angry 8
#5 scandals 3
注意:你没有&#39; a&#39;在stopwordslst
中,因此包含在内。
或者一点清洁,
idx <- !words2 %in% stopwordslst
df <- data.frame(words = words2[idx],frequency = frequency2[idx])