在合并之前,我已经处理了列中的nan值。
sales_df['built'].unique()
array([1955。,1951.,1933.,1965.,1987.,2001.,1995.,1963.,1960., 2003.,1942.,1977.,1900.,1979.,1994.,1916.,1921,1969., 1947.,1968.,1985.,1941.,1915.,1909.,1948.,2005.,1929., 1981.,1930.,1904.,1996.,2000.,1984.,2014.,1922.,1959., 1966.,1953.,1950.,1927.,2008.,1991.,1954.,1925.,1989., 1973.,1972.,1986.,1956.,2002.,1992.,1964.,1952.,1961., 2006.,1988.,1939.,1946.,1967.,1975.,1910.,1983.,1978., 1905.,1971.,2010.,1924.,1990.,1914.,1926.,2004.,1962., 1923.,2007.,1976.,1949.,1999.,1980.,1901.,1993.,1920., 1997.,1943.,1957.,1940.,1918.,1928.,1974.,1911.,1936., 1937.,1982.,1908.,1931.,1998.,2013.,1907.,1958.,2012., 1912.,2011.,1917.,1932.,1944.,1902.,2009.,1903.,1970., 2015.,1934.,1938.,1913.,1919.,1906.,1945.,1935。]]
像这样使用KBinsDiscretizer之后。
# use kbinsdiscretizer
from sklearn.preprocessing import KBinsDiscretizer
def kbin(variables, encoding):
bin_df = sales_df[variables].copy()
discretizer = KBinsDiscretizer(n_bins=8,
encode=encoding,
strategy='quantile')
sales_df[variables] = pd.DataFrame(discretizer.fit_transform(bin_df), columns=bin_df.columns)
ordinal_bin = ['built', 'renovation', 'years_from_last_renovation']
ordinal_binned = kbin(ordinal_bin, 'ordinal')
最后我在列中输入了nan值。
sales_df['built'].unique()
array([2.,1.,3.,5.,6.,0.,4.,7.,nan])
答案 0 :(得分:0)
结果是我忘记了重置原始DataFrame的索引。当我用KBinsDiscretizer的结果创建一个新的DataFrame时,行索引不匹配,因此在发生这种情况的地方创建了nan值。
运行KBinsDiscretizer之前只是一个简单的修复程序
sales_df.reset_index(drop=True)