<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.roshan</groupId>
<artifactId>registry</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>registry</name>
<description></description>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
</properties>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.5.6.RELEASE</version>
</parent>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Dalston.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-eureka-server</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-config</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<finalName>registry</finalName>
</configuration>
</plugin>
</plugins>
</build>
</project>
嗨,我有那张桌子。我想用';'拆分字符串表,并将其存储到新列。最后一栏应该是这样的
id string
0 31672;0
1 31965;0
2 0;78464
3 51462
4 31931;0
如果有人知道如何用python做它会很好。
答案 0 :(得分:2)
选项1
使用str.split
+ str.len
-
df['word_count'] = df['string'].str.split(';').str.len()
df
string word_count
id
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
选项2
使用str.count
-
df['word_count'] = df['string'].str.count(';') + 1
df
string word_count
id
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
警告 - 即使对于空字符串,这也会将字数归为1(在这种情况下,坚持使用选项1)。
如果您希望每个单词占据一个新列,可以使用tolist
快速简单地将分割加载到新数据框中,并使用concat
- <将新数据框与原始数据连接起来/ p>
v = pd.DataFrame(df['string'].str.split(';').tolist())\
.rename(columns=lambda x: x + 1)\
.add_prefix('string_')
pd.concat([df, v], 1)
string word_count string_1 string_2
id
0 31672;0 2 31672 0
1 31965;0 2 31965 0
2 0;78464 2 0 78464
3 51462 1 51462 None
4 31931;0 2 31931 0