我有一个熊猫数据框,其中的一列包含一些字符串。我想根据字数将该列拆分为未知的列数。
假设我有DataFrame df
:
Index Text
0 He codes
1 He codes well in python
2 Python is great language
3 Pandas package is very handy
现在,我想将文本列分为多列,每列各包含2个单词。
Index 0 1 2
0 He codes NaN NaN
1 He codes well in python
2 Python is great language NaN
3 Pandas package is very handy
如何在python中执行此操作?请帮忙。预先感谢。
答案 0 :(得分:4)
给出一个数据框df
,在Text
列中,我们需要将句子分为两个单词:
import pandas as pd
def splitter(s):
spl = s.split()
return [" ".join(spl[i:i+2]) for i in range(0, len(spl), 2)]
df_new = pd.DataFrame(df["Text"].apply(splitter).to_list())
# 0 1 2
# 0 He codes well None
# 1 He codes well in Python
答案 1 :(得分:2)
IIUC,我们可以str.split
groupby
cumcount
进行楼层划分和unstack
s = (
df["Text"]
.str.split("\s", expand=True)
.stack()
.to_frame("words")
.reset_index(1, drop=True)
)
s["count"] = s.groupby(level=0).cumcount() // 2
final = s.rename_axis("idx").groupby(["idx", "count"])["words"].agg(" ".join).unstack(1)
print(final)
count 0 1 2
idx
0 He codes NaN NaN
1 He codes well in python
2 Python is great language NaN
3 Pandas package is very handy