Question

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.roshan</groupId>
    <artifactId>registry</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>registry</name>
    <description></description>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
    </properties>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.6.RELEASE</version>
    </parent>
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>Dalston.SR1</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-eureka-server</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-config</artifactId>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <finalName>registry</finalName>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

嗨，我有那张桌子。我想用';'拆分字符串表，并将其存储到新列。最后一栏应该是这样的

id   string   
0    31672;0           
1    31965;0
2    0;78464
3      51462
4    31931;0

如果有人知道如何用python做它会很好。

Answer 1

选项1
使用str.split + str.len -

的基本解决方案

df['word_count'] = df['string'].str.split(';').str.len()
df

     string  word_count
id                     
0   31672;0           2
1   31965;0           2
2   0;78464           2
3     51462           1
4   31931;0           2

选项2
使用str.count -

的聪明（高效，占用空间更少）的解决方案

df['word_count'] = df['string'].str.count(';') + 1
df

     string  word_count
id                     
0   31672;0           2
1   31965;0           2
2   0;78464           2
3     51462           1
4   31931;0           2

警告 - 即使对于空字符串，这也会将字数归为1（在这种情况下，坚持使用选项1）。

如果您希望每个单词占据一个新列，可以使用tolist快速简单地将分割加载到新数据框中，并使用concat - <将新数据框与原始数据连接起来/ p>

v = pd.DataFrame(df['string'].str.split(';').tolist())\
        .rename(columns=lambda x: x + 1)\
        .add_prefix('string_')

pd.concat([df, v], 1)

     string  word_count string_1 string_2
id                                       
0   31672;0           2    31672        0
1   31965;0           2    31965        0
2   0;78464           2        0    78464
3     51462           1    51462     None
4   31931;0           2    31931        0

拆分一列字符串并用pandas计算单词数

1 个答案: