问题描述
我有一个 dataframe.以下是两个相关的列:一个是 int 的列,另一个是 str 的列.
i have a dataframe. two relevant columns are the following: one is a column of int and another is a column of str.
我知道如果我将 nan 插入 int 列,pandas 会将所有 int 转换为 float 因为 int 没有 nan 值.
i understand that if i insert nan into the int column, pandas will convert all the int into float because there is no nan value for an int.
但是,当我将 none 插入 str 列时,pandas 会将我的所有 int 转换为 float 为好.这对我来说没有意义 - 为什么我在第 2 列中输入的值会影响第 1 列?
however, when i insert none into the str column, pandas converts all my int to float as well. this doesn't make sense to me - why does the value i put in column 2 affect column 1?
这是一个简单的工作示例(python 2):
here's a simple working example (python 2):
import pandas as pd df = pd.dataframe() df["int"] = pd.series([], dtype=int) df["str"] = pd.series([], dtype=str) df.loc[0] = [0, "zero"] print df print df.loc[1] = [1, none] print df
输出是
int str 0 0 zero int str 0 0.0 zero 1 1.0 nan
有没有办法让输出如下:
is there any way to make the output the following:
int str 0 0 zero int str 0 0 zero 1 1 nan
不将第一列重铸为 int.
我更喜欢使用 int 而不是 float 因为实际数据在该列是整数.如果没有解决方法,我只会使用 float.
i prefer using int instead of float because the actual data in that column are integers. if there's not workaround, i'll just use float though.
我不喜欢重铸,因为在我的实际代码中,我不需要
存储实际的dtype.
i prefer not having to recast because in my actual code, i don't
store the actual dtype.
我还需要逐行插入数据.
i also need the data inserted row-by-row.
推荐答案
如果你设置dtype=object,你的系列就可以包含任意数据类型:
if you set dtype=object, your series will be able to contain arbitrary data types:
df["int"] = pd.series([], dtype=object) df["str"] = pd.series([], dtype=str) df.loc[0] = [0, "zero"] print(df) print() df.loc[1] = [1, none] print(df) int str 0 0 zero 1 nan nan int str 0 0 zero 1 1 none