您好我想通过删除丢失的信息来操纵数据,并将所有字母设为小写。但对于小写转换,我收到此警告:
E:\ Program Files Extra \ Python27 \ lib \ site-packages \ pandas \ core \ frame.py:1808:UserWarning: Boolean系列键将重新编制索引以匹配DataFrame索引。 " DataFrame索引。",UserWarning) C:\ Users \ KubiK \ Desktop \ FamSeach_NameHandling.py:18:SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是
请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy frame3 [" name"] = frame3 [" name"]。str.lower() C:\ Users \ KubiK \ Desktop \ FamSeach_NameHandling.py:19:SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是
请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 第3帧["种族"] =第3帧["种族"]。str.lower()
import pandas as pd
from pandas import DataFrame
# Get csv file into data frame
data = pd.read_csv("C:\Users\KubiK\Desktop\OddNames_sampleData.csv")
frame = DataFrame(data)
frame.columns = ["name", "ethnicity"]
name = frame.name
ethnicity = frame.ethnicity
# Remove missing ethnicity data cases
index_missEthnic = frame.ethnicity.isnull()
index_missName = frame.name.isnull()
frame2 = frame[index_missEthnic != True]
frame3 = frame2[index_missName != True]
# Make all letters into lowercase
frame3["name"] = frame3["name"].str.lower()
frame3["ethnicity"] = frame3["ethnicity"].str.lower()
# Test outputs
print frame3
这个警告似乎并不致命(至少对我的小样本数据而言),但我该如何处理?
示例数据
Name Ethnicity
Thos C. Martin Russian
Charlotte Wing English
Frederick A T Byrne Canadian
J George Christe French
Mary R O'brien English
Marie A Savoie-dit Dugas English
J-b'te Letourneau Scotish
Jane Mc-earthar French
Amabil?? Bonneau English
Emma Lef??c French
C., Akeefe African
D, James Matheson English
Marie An: Thomas English
Susan Rrumb;u English
English
Kaio Chan
答案 0 :(得分:3)
不确定为什么你需要这么多的布尔...
另请注意,.isnull()
不会捕获空字符串。
并且在应用.lower()
之前过滤空字符串似乎也不是必需的。
但它有需要......这对我有用:
frame = pd.DataFrame({'name':['Abc Def', 'EFG GH', ''], 'ethnicity':['Ethnicity1','', 'Ethnicity2']})
print frame
ethnicity name
0 Ethnicity1 Abc Def
1 EFG GH
2 Ethnicity2
name_null = frame.name.str.len() == 0
frame.loc[~name_null, 'name'] = frame.loc[~name_null, 'name'].str.lower()
print frame
ethnicity name
0 Ethnicity1 abc def
1 efg gh
2 Ethnicity2
答案 1 :(得分:2)
设置frame2 / 3时,尝试使用.loc,如下所示:
frame2 = frame.loc[~index_missEthnic, :]
frame3 = frame2.loc[~index_missName, :]
我认为这可以解决您所看到的错误:
frame3.loc[:, "name"] = frame3.loc[:, "name"].str.lower()
frame3.loc[:, "ethnicity"] = frame3.loc[:, "ethnicity"].str.lower()
您也可以尝试以下方法,但它没有回答您的问题:
frame3.loc[:, "name"] = [t.lower() if isinstance(t, str) else t for t in frame3.name]
frame3.loc[:, "ethnicity"] = [t.lower() if isinstance(t, str) else t for t in frame3. ethnicity]
这会将列中的任何字符串转换为小写,否则会保持值不变。