我在使用pandas.DataFrame
的构造函数和使用dtype
参数时遇到问题。我想保留字符串值,但以下代码段始终转换为数字类型,然后生成NaN
s。
from __future__ import unicode_literals
from __future__ import print_function
import numpy as np
import pandas as pd
def main():
columns = ['great', 'good', 'average', 'bad', 'horrible']
# minimal example, dates are coming (as strings) from some
# non-file source.
example_data = {
'alice': ['', '', '', '2016-05-24', ''],
'bob': ['', '2015-01-02', '', '', '2012-09-15'],
'eve': ['2011-12-31', '', '1998-08-13', '', ''],
}
# first pass, yields dataframe full of NaNs
df = pd.DataFrame(data=example_data, index=example_data.keys(),
columns=columns, dtype=str) #or string, 'str', 'string', 'object'
print(df.dtypes)
print(df)
print()
# based on https://github.com/pydata/pandas/blob/master/pandas/core/frame.py
# and https://github.com/pydata/pandas/blob/37f95cef85834207db0930e863341efb285e38a2/pandas/types/common.py
# we're ultimately feeding dtype to numpy's dtype, so let's just use that:
# (using np.dtype('S10') and converting to str doesn't work either)
df = pd.DataFrame(data=example_data, index=example_data.keys(),
columns=columns, dtype=np.dtype('U'))
print(df.dtypes)
print(df) # still full of NaNs... =(
if __name__ == '__main__':
main()
dtypes
的哪些值将保留数据框中的字符串?
供参考:
$ python --version
2.7.12
$ pip2 list | grep pandas
pandas(0.18.1)
$ pip2 list | grep numpy
numpy(1.11.1)
答案 0 :(得分:1)
对于OP中的特定情况,您可以使用DataFrame.from_dict()
constructor(另请参阅DataFrame文档的Alternate Constructors部分)。
from __future__ import unicode_literals
from __future__ import print_function
import pandas as pd
columns = ['great', 'good', 'average', 'bad', 'horrible']
example_data = {
'alice': ['', '', '', '2016-05-24', ''],
'bob': ['', '2015-01-02', '', '', '2012-09-15'],
'eve': ['2011-12-31', '', '1998-08-13', '', ''],
}
df = pd.DataFrame.from_dict(example_data, orient='index')
df.columns = columns
print(df.dtypes)
# great object
# good object
# average object
# bad object
# horrible object
# dtype: object
print(df)
# great good average bad horrible
# bob 2015-01-02 2012-09-15
# eve 2011-12-31 1998-08-13
# alice 2016-05-24
您甚至可以在dtype=str
中指定DataFrame.from_dict()
- 尽管在此示例中没有必要。
编辑:DataFrame构造函数将字典解释为列的集合:
print(pd.DataFrame(example_data))
# alice bob eve
# 0 2011-12-31
# 1 2015-01-02
# 2 1998-08-13
# 3 2016-05-24
# 4 2012-09-15
(我放弃data=
,因为data
是函数签名中的第一个参数。您的代码会混淆行和列:
print(pd.DataFrame(example_data, index=example_data.keys(), columns=columns))
# great good average bad horrible
# alice NaN NaN NaN NaN NaN
# bob NaN NaN NaN NaN NaN
# eve NaN NaN NaN NaN NaN
(虽然我并不完全确定它最终是如何为您提供NaN
s的数据帧。
print(pd.DataFrame(example_data, columns=example_data.keys(), index=columns))
# alice bob eve
# great 2011-12-31
# good 2015-01-02
# average 1998-08-13
# bad 2016-05-24
# horrible 2012-09-15
实际上不需要指定列名 - 它们已经从字典解析:
print(pd.DataFrame(example_data, index=columns))
# alice bob eve
# great 2011-12-31
# good 2015-01-02
# average 1998-08-13
# bad 2016-05-24
# horrible 2012-09-15
你想要的实际上是转置 - 所以你也可以采取这种转置!
print(pd.DataFrame(data=example_data, index=columns).T)
# great good average bad horrible
# alice 2016-05-24
# bob 2015-01-02 2012-09-15
# eve 2011-12-31 1998-08-13
答案 1 :(得分:0)
这不是一个正确的答案,但是当你得到别人的答案时,我注意到使用read_csv
功能一切正常。
因此,如果您将数据放在名为.csv
的{{1}}文件中,请执行以下操作:
myData.csv
并做
great,good,average,bad,horrible
alice,,,,2016-05-24,
bob,,2015-01-02,,,2012-09-15
eve,2011-12-31,,1998-08-13,,
它会保持字符串不变!
df = pd.read_csv('blablah/myData.csv')
如果需要,可以将空值作为空格放在csv文件或任何其他字符/标记中。