我想从 Pandas 数据帧列中获取字符串的前半部分,其中长度逐行变化。我四处搜索并找到了 questions like this 但解决方案都集中在分隔符和正则表达式上。我没有分隔符 - 我只想要字符串的前半部分,无论它有多长。
我可以指定我想要的字符串长度:
import pandas as pd
eggs = pd.DataFrame({"id": [0, 1, 2, 3],
"text": ["eggs and spam", "green eggs and spam", "eggs and spam2", "green eggs"]})
eggs["half_length"] = eggs.text.str.len() // 2
然后我想做某事,比如eggs["truncated_text"] = eggs["text"].str[:eggs.half_length]
。还是首先定义此列是错误的方法?有人可以帮忙吗?
答案 0 :(得分:1)
您可以将函数应用于 text
列:
import pandas as pd
eggs = pd.DataFrame({"id": [0, 1, 2, 3],
"text": ["eggs and spam", "green eggs and spam", "eggs and spam2", "green eggs"]})
eggs['truncated_text'] = eggs['text'].apply(lambda text: text[:len(text) // 2])
输出
| id | text | truncated_text |
|-----:|:--------------------|:-----------------|
| 0 | eggs and spam | eggs a |
| 1 | green eggs and spam | green egg |
| 2 | eggs and spam2 | eggs an |
| 3 | green eggs | green |