问题描述
是否有一种首选方法可以将 numpy 数组的数据类型固定为 int (或 int64 或其他),同时仍然里面有一个元素列为 numpy.nan?
is there a preferred way to keep the data type of a numpy array fixed as int (or int64 or whatever), while still having an element inside listed as numpy.nan?
特别是,我正在将内部数据结构转换为 pandas dataframe.在我们的结构中,我们有仍然有 nan 的整数类型列(但列的 dtype 是 int).如果我们将其设为 dataframe,似乎会将所有内容重铸为浮点数,但我们真的很想成为 int.
in particular, i am converting an in-house data structure to a pandas dataframe. in our structure, we have integer-type columns that still have nan's (but the dtype of the column is int). it seems to recast everything as a float if we make this a dataframe, but we'd really like to be int.
想法?
尝试过的事情:
我尝试使用 pandas.dataframe 下的 from_records() 函数和 coerce_float=false 但这没有帮助.我还尝试使用 numpy 掩码数组和 nan fill_value,这也不起作用.所有这些都导致列数据类型变为浮点数.
i tried using the from_records() function under pandas.dataframe, with coerce_float=false and this did not help. i also tried using numpy masked arrays, with nan fill_value, which also did not work. all of these caused the column data type to become a float.
推荐答案
此功能已添加到 pandas(从 0.24 版本开始):https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support
this capability has been added to pandas (beginning with version 0.24): https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support
此时,它需要使用扩展dtype int64(大写),而不是默认dtype int64(小写).
at this point, it requires the use of extension dtype int64 (capitalized), rather than the default dtype int64 (lowercase).