如何根据标签将数据分为训练和测试数据集? 标签是1和0,我想将所有1用作训练数据集,将0用作测试数据集。 csv文件如下所示:
1 Pixar classic is one of the best kids' movies of all time.
1 Apesar de representar um imenso avanço tecnológico, a força do filme reside no carisma de seus personagens e no charme de sua história.
1 When Woody perks up in the opening scene, it's not only the toy cowboy who comes alive - we're watching the rebirth of an art form.
0 The humans are wooden, the computer-animals have that floating, jerky gait of animated fauna.
1 Introduced not one but two indelible characters to the pop culture pantheon: cowboy rag-doll Woody (Tom Hanks) and plastic space ranger Buzz Lightyear (Tim Allen). [Blu-ray]
1 it is easy to see how virtually everything that is good in animation right now has some small seed in Toy Story
0 All the effects in the world can't disguise the thin plot.
1 Though some of the animation seems dated compared to later Pixar efforts and not nearly as detailed, what's here is done impeccably well.
答案 0 :(得分:0)
尝试一下
mask = df['label']==1
df_train = df[mask]
df_test = df[~mask]
您只需要过滤数据框。
答案 1 :(得分:0)
通常,您不想这样做,但是以下解决方案可以起作用。我尝试了一个很小的数据框,但似乎可以完成工作。
$Diff
/* @flow */
type Props = { name: string, age: number };
type DefaultProps = { age: number };
type RequiredProps = $Diff<Props, DefaultProps>;
const a1: RequiredProps = { name: 'foo' };
const a2: RequiredProps = { name: 'foo', age: 1 };
const a3: RequiredProps = { name: 'foo', age: '1' }; // why no error?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="dummy" exclude-result-prefixes="my xs">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:function name="my:lastPage">
<xsl:param name="s1" as="xs:string"/>
<xsl:param name="s2" as="xs:string"/>
<xsl:variable name="l1" select="string-length($s1)"/>
<xsl:variable name="l2" select="string-length($s2)"/>
<xsl:for-each select="1 to $l2">
<xsl:if test="$l1 lt $l2 or substring($s1, 1, .) != substring($s2, 1, .)">
<xsl:value-of select="substring($s2, ., 1)"/>
</xsl:if>
</xsl:for-each>
</xsl:function>
<xsl:template match="doc">
<p><xsl:value-of select="concat(year, ';', volume, '(6):', fpage, '-')"/>
<xsl:value-of select="my:lastPage(fpage, lpage)"/>
</p>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>
</xsl:stylesheet>
我得到了跟踪结果
import pandas as pd
Df = pd.DataFrame()
Df['label'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
Df['value'] = [1, 2, 3, 4, 5, 6, 7, 8]
Df
X = Df[Df.label== 'S']
Y = Df[Df.label == 'P']
from sklearn.model_selection import train_test_split
xtrain, ytrain = train_test_split(X, test_size=0.3,random_state=25, shuffle=True)
xtest, ytest = train_test_split(Y, test_size=0.3,random_state=25, shuffle=True)
xtrain
label value
5 S 6
2 S 3
7 S 8
xtest
label value
6 P 7
3 P 4
ytest
答案 2 :(得分:0)
d = {'col1': [1, 1, 1, 1, 0, 0, 0, 0], 'text': ["a", "b", "c", "d", "e", "f", "g", "h"]}
df = pd.DataFrame(data=d)
df.head()
label text
0 1 a
1 1 b
2 1 c
3 1 d
4 0 e
您可以使用以下代码基于每个行值进行过滤, 等于1时从col1捕获数据。
traindf = df[df["label"] == 1]
traindf
label text
0 1 a
1 1 b
2 1 c
3 1 d
testdf = df[df["label"] == 0]
testdf
label text
4 0 e
5 0 f
6 0 g
7 0 h