在所有 pandas 列中将字符串转换为浮点数，这是可能的-百家乐凯发k8

问题描述

我从列表列表中创建了一个 pandas 数据框

i created a pandas dataframe from a list of lists

import pandas as pd
df_list = [["a", "1", "2"], ["b", "3", np.nan]]
df = pd.dataframe(df_list, columns = list("abc"))
>>>   a  b    c
   0  a  1    2
   1  b  3  nan

有没有办法将数据框的所有列转换为可以转换的浮点数，即 b 和 c?如果您知道要转换哪些列，则可以使用以下方法:

is there a way to convert all columns of the dataframe to float, that can be converted, i.e. b and c? the following works, if you know, which columns to convert:

  df[["b", "c"]] = df[["b", "c"]].astype("float")

但是，如果您事先不知道哪些列包含数字，您会怎么做?当我尝试时

but what do you do, if you don't know in advance, which columns contain the numbers? when i tried

  df = df.astype("float", errors = "ignore")

所有列仍然是字符串/对象.同样，

all columns are still strings/objects. similarly,

df[["b", "c"]] = df[["b", "c"]].apply(pd.to_numeric)

转换两列(虽然b"是 int 而c"是float"，因为存在 nan 值)，但是

converts both columns (though "b" is int and "c" is "float", because of the nan value being present), but

df = df.apply(pd.to_numeric)

显然会引发错误消息，我看不出有什么方法可以抑制它.
是否有可能在不遍历每一列的情况下执行此字符串-浮点转换，以尝试 .astype("float", errors = "ignore")?

obviously throws an error message and i don't see a way to suppress this.
is there a possibility to perform this string-float conversion without looping through each column, to try .astype("float", errors = "ignore")?

推荐答案

我觉得你需要errors='ignore'pandas-docs/stable/generated/pandas.to_numeric.html" rel="noreferrer">to_numeric:

i think you need parameter errors='ignore' in to_numeric:

df = df.apply(pd.to_numeric, errors='ignore')
print (df.dtypes)
a     object
b      int64
c    float64
dtype: object

如果不是混合值，它工作得很好 - 带有字符串的数字:

it working nice if not mixed values - numeric with strings:

df_list = [["a", "t", "2"], ["b", "3", np.nan]]
df = pd.dataframe(df_list, columns = list("abc"))
df = df.apply(pd.to_numeric, errors='ignore')
print (df)
   a  b    c
0  a  t  2.0 <=added t to column b for mixed values
1  b  3  nan
print (df.dtypes)
a     object
b     object
c    float64
dtype: object

您也可以将 int 向下转换为 floats:

you can downcast also int to floats:

df = df.apply(pd.to_numeric, errors='ignore', downcast='float')
print (df.dtypes)
a     object
b    float32
c    float32
dtype: object

同理:

df = df.apply(lambda x: pd.to_numeric(x, errors='ignore', downcast='float'))
print (df.dtypes)
a     object
b    float32
c    float32
dtype: object