从熊猫数据框列中获取字符串的前半部分

时间:2021-05-23 21:21:40

标签: python pandas string

我想从 Pandas 数据帧列中获取字符串的前半部分,其中长度逐行变化。我四处搜索并找到了 questions like this 但解决方案都集中在分隔符和正则表达式上。我没有分隔符 - 我只想要字符串的前半部分,无论它有多长。

我可以指定我想要的字符串长度:

import pandas as pd

eggs = pd.DataFrame({"id": [0, 1, 2, 3],
                     "text": ["eggs and spam", "green eggs and spam", "eggs and spam2", "green eggs"]})

eggs["half_length"] = eggs.text.str.len() // 2

然后我想做某事,比如eggs["truncated_text"] = eggs["text"].str[:eggs.half_length]。还是首先定义此列是错误的方法?有人可以帮忙吗?

1 个答案:

答案 0 :(得分:1)

您可以将函数应用于 text 列:

import pandas as pd

eggs = pd.DataFrame({"id": [0, 1, 2, 3],
                     "text": ["eggs and spam", "green eggs and spam", "eggs and spam2", "green eggs"]})

eggs['truncated_text'] = eggs['text'].apply(lambda text: text[:len(text) // 2])

输出

|   id | text                | truncated_text   |
|-----:|:--------------------|:-----------------|
|    0 | eggs and spam       | eggs a           |
|    1 | green eggs and spam | green egg        |
|    2 | eggs and spam2      | eggs an          |
|    3 | green eggs          | green            |