假设我有以下数据框
import pandas as pd
data = [['Mallika', 23, 'Student'], ['Yash', 25, 'Tutor'], ['Abc', 14, 'Clerk']]
data_frame = pd.DataFrame(data, columns=['Student.first.name.word', 'Student.Current.Age.word', 'Student.Current.Profession.word'])
Student.first.name.word Student.Current.Age.word Student.Current.Profession.word
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
如何分出常见的列标题词“Student”和“word”
以便您获得以下数据框:
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
答案 0 :(得分:3)
您可以从带有 regex 的列中删除这些单词和 .
并将其重新分配:
data_frame.columns = data_frame.columns.str.replace(r"(Student|word|\.)", "")
得到
>>> data_frame
name Age Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
更新后
您可以split - slice - join
:
data_frame.columns = data_frame.columns.str.split(r"\.").str[1:-1].str.join(".")
即拆分文字点,取出第一个和最后一个元素,最后用一个点将它们连接起来
得到
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
答案 1 :(得分:2)
这是我的 answer 的扩展,用于删除常见前缀。这种方法的好处是它以通用的方式查找前缀和后缀,因此无需硬编码任何模式。
cols = data_frame.columns
common_prefix = os.path.commonprefix(cols.tolist())
common_suffix = os.path.commonprefix([col[::-1] for col in cols])[::-1]
data_frame.columns = cols.str.replace(f"{common_prefix}|{common_suffix}", "", regex=True)
name Age Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
更新,相同的解决方案适用于更新的问题:
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
答案 2 :(得分:1)
删除所有单词,而不仅仅是您可以尝试的硬编码的单词
df = data_frame
from functools import reduce
common_words = [i.split(".") for i in df.columns.tolist()]
common_words =reduce(lambda x,y : set(x).intersection(y) ,common_words)
pat = r'\b(?:{})\b'.format('|'.join(common_words))
df.columns = df.columns.str.replace(pat, "").str[1:-1]
输出:
print(df)
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk