python pandas-使用astype处理字符串中的逗号到浮点转换的通用方法

如何解决python pandas-使用astype处理字符串中的逗号到浮点转换的通用方法

是否存在一种通用方法来告诉熊猫使用逗号（“，”）作为从字符串到浮点等的类型转换的小数点分隔符？

import pandas as pd
from datetime import datetime

data = {
    "col_str": ["a","b","c"],"col_int": ["1","2","3"],"col_float": ["1,2","3,2342","97837,8277"],"col_float2": ["13,"3234,"263,"col_date": [datetime(2020,8,1,3,4).isoformat(),datetime(2020,2,4,5).isoformat(),6,4).isoformat()
                 ]
}

conversion_dict = {
    "col_str": str,"col_int": int,"col_float": float,"col_float2": float,"col_date": "datetime64"
}

df = pd.DataFrame(data=data)

print(df.dtypes)
df = df.astype(conversion_dict,errors="ignore")
print(df.dtypes)
print(df)

上面的示例返回“ col_float”和“ col_float2”的对象列，或者在错误设置为“ raise”时抛出错误。

我想直接使用astype（）方法，而不用点手动替换逗号。数据源通常返回带逗号的浮点数作为小数点分隔符，因为语言环境设置为德语。

是否有一种通用方法可以将大熊猫作为浮点逗号（或其他任何带小数点的数字数据类型）都可以识别并且应该自动转换的类型？

PS：我无法使用read_csv，因为它是数据库，所以您不能在其中直接指定分隔符。

谢谢。

解决方法

您可以使用locale库和apply()和locale.atof来解决此问题。只需用适当的语言环境替换即可。在这种情况下，我使用了de_DE，因为它们使用的是“，”小数。

import locale
from datetime import datetime

import pandas as pd

locale.setlocale(locale.LC_ALL,locale="de_DE")


data = {
    "col_str": ["a","b","c"],"col_int": ["1","2","3"],"col_float": ["1,2","3,2342","97837,8277"],"col_float2": ["13,"3234,"263,"col_date": [datetime(2020,8,1,3,4).isoformat(),datetime(2020,2,4,5).isoformat(),6,4).isoformat()
                 ]
}

conversion_dict = {
    "col_str": str,"col_int": int,"col_float": str,"col_float2": str,"col_date": "datetime64"
}

df = pd.DataFrame(data=data)

print(df.dtypes)
df = df.astype(conversion_dict,errors="ignore")
df["col_float"] = df["col_float"].apply(locale.atof)
df["col_float2"] = df["col_float2"].apply(locale.atof)
print(df.dtypes)
print(df)

输出：

col_str       object
col_int       object
col_float     object
col_float2    object
col_date      object
dtype: object
col_str               object
col_int                int64
col_float            float64
col_float2           float64
col_date      datetime64[ns]
dtype: object
  col_str  col_int   col_float  col_float2            col_date
0       a        1      1.2000     13.2000 2020-08-01 00:03:04
1       b        2      3.2342   3234.2342 2020-08-02 02:04:05
2       c        3  97837.8277    263.8277 2020-08-03 06:08:04

我通过以下解决方法解决了该问题。在某些情况下，这仍然可能会中断，但是我没有找到一种方法来告诉窗格astype（）逗号是可以的。如果某人仅对熊猫有其他解决方案，请告诉我：

import locale
from datetime import datetime
import pandas as pd

data = {
    "col_str": ["a","col_float": float,"col_float2": float,"col_date": "datetime64"
}

df = pd.DataFrame(data=data)
throw_error = True

try:
    df = df.astype(conversion_dict,errors="raise")
except ValueError as e:
    error_message = str(e).strip().upper()
    error_search = "COULD NOT CONVERT STRING TO FLOAT:"
    # compare error messages to only get the string to float error because pandas only throws ValueError´s which
    # are not datatype specific. This might be quite hacky because error messages could change.
    if error_message[:len(error_search)] == error_search:
        # convert everything else and ignore errors for the float columns
        df = df.astype(conversion_dict,errors="ignore")
        # go over the conversion dict
        for key,value in conversion_dict.items():
            # print(str(key) + ":" + str(value) + ":" + str(df[key].dtype))
            # only apply to convert-to-float-columns which are not already in the correct pandas type float64
            # if you don´t check for correctly classified types,.str.replace() throws an error
            if (value == float or value == "float") and df[key].dtype != "float64":
                # df[key].apply(locale.atof) or anythin locale related is plattform dependant and therefore bad
                # in my opinion
                # locale settings for atof
                # WINDOWS: locale.setlocale(locale.LC_ALL,'deu_deu')
                # UNIX: locale.setlocale(locale.LC_ALL,'de_DE')
                df[key] = pd.to_numeric(df[key].str.replace(',','.'))
    else:
        if throw_error:
            # or do whatever is best suited for your use case
            raise ValueError(str(e))
        else:
            df = df.astype(conversion_dict,errors="ignore")

print(df.dtypes)
print(df)

python pandas-使用astype处理字符串中的逗号到浮点转换的通用方法

如何解决python pandas-使用astype处理字符串中的逗号到浮点转换的通用方法

解决方法

相关推荐