Question

我将pd.series转换为数据框。转换后，dataframe列中的一个没有任何名称，而另一个以“ 0”作为其名称。我需要给列命名。

我尝试使用df.columns = [“ A”，“ B”]并重命名，但这无济于事

isasyncgenfunction

预期结果将是

import pandas as pd
import nltk
from nltk.corpus import stopwords       #for removing stopwords
import re                               #for removing numbers, special characters
#Import CSV into dataframe
filepath = "C:/a/Python/Clustering/LabeledRawDatav2.csv"
df = pd.read_csv(filepath,encoding='windows-1252')
print(df.head(2))

freq = pd.DataFrame(columns=["Word","Count"])

freq = pd.Series(' '.join(df["Notes"]).split()).value_counts()[:]
freq = pd.Series.to_frame(freq)

freq.rename(columns = {"0":"Freq"},inplace=True)

print(freq)

实际结果是

Word                  freq
-                     206
the                    65
for                    62
1                      62
DAYS                   56

Answer 1

我通常这样做：

freq = df["Notes"].str.split(expand = True).stack().value_counts().rename_axis('word').reset_index(name = 'count')

这可以克服0列问题。

要感谢原始作者jezrael，因为我是从他的一个答案中摘录的，所以似乎找不到原始链接！

Answer 2

最初，您有一个由value_counts()构建的未命名系列，然后使用to_frame将其转换为DataFrame。

这意味着DataFrame具有单词（-，the，for，...）作为 index ，以及一个名为0的单列-整数值0，而不是字符串““ 0”。

您想要的是：

# give a name to the original Series: freq
freq = pd.Series(' '.join(df["Notes"]).split(), name='freq').value_counts()

# give a name to the index and convert to a dataframe
freq = freq.rename_axis('Word').to_frame().reset_index()

将名称赋予没有名称的数据框列

2 个答案: