IEEE754 格式的减法

如何解决IEEE754 格式的减法

我正在尝试使用 fp32 的二进制表示进行减法。

为了执行减法，我看到了这个问题：

我知道 IEE754 中的减法类似于加法，直到添加/减去尾数的步骤。如果我理解正确的话，算法加法/减法的步骤是：

获取最大的数字。
用最小的指数减去最大的指数，然后取最大的指数得到结果。
将尾数从最小操作数向右移动，直到指数对齐。
现在，如果操作数的符号相等（+、+ 或 -、-），则添加尾数。如果符号不同（+、- 或 -、+），则用移位的尾数减去最大操作数（量级）的尾数（我认为我在这里弄错了......）
加法时：如果尾数的加法溢出，则将指数加一并将结果尾数右移。减法时：我不明白我需要在这里做

我的疑问是：如果输入是，例如：100.3654,-100.254，结果指数必须小于输入的指数，因为 100.3654-100.254 = 0.1114 和0.1114 是 123，但 100.3654 和 -100.254 的指数是 133，所以，当我应用该步骤减去指数以获得偏移时，结果偏移为零......我已经实现了在 IEEE754 中正确执行减法，但仅当操作数并不接近。例如，如果第一个操作数是 100，第二个操作数是 -1.95。在这种情况下，结果指数也是 133。

这是我用来检查算法的代码（Python）。我也在 Verilog 中实现了它，如果操作数具有相同的符号，它就可以很好地工作。没有设置舍入模式，因为我不需要它。

def ieee754_addition(a,b):

    # Get binary representation for input a
    a_bin  = ieee754_bin(a)
    a_exp  = int(a_bin['exp'],2)
    a_mnts = int("1"+a_bin['mnts'],2)
    
    # Get binary representation for input b
    b_bin  = ieee754_bin(b)
    b_exp  = int(b_bin['exp'],2)
    b_mnts = int("1"+b_bin['mnts'],2)
    
    # We suppose that operand a is greater than operand b
    exp = a_exp
    mnts = a_mnts
    exp_m = b_exp
    mnts_m = b_mnts
    
    # If operand b is greater than operand b
    if b_exp >= a_exp:
        mnts = b_mnts
        exp = b_exp
        exp_m = a_exp
        mnts_m = a_mnts
    
    # How many shifts are needed to normalize the smaller mantissa
    shift = int(exp - exp_m)
    
    # Shift the mantissa
    mnts_m_shift = mnts_m >> shift


    # If the signs are distincts,perform mantissa's subtraction
    if ((a_bin['sign'] == '1' and b_bin['sign'] == '0') or (a_bin['sign'] == '0' and b_bin['sign'] == '1')):
        if mnts > mnts_m_shift:
            mnts = (mnts - mnts_m_shift)
        else:
            mnts = (mnts_m_shift - mnts)
    # If signs are equals
    else:
        # Adding the mantissas
        mnts = (mnts + mnts_m_shift)
    
    # Get the sign of the greater operand
    if (abs(a) > abs(b)) :
        sign = a_bin['sign']
    else:
        sign = b_bin['sign']
    
    # If signs are equal
    if (a_bin['sign'] == b_bin['sign']):
        sign = a_bin['sign']
    
    msign = 1
    if sign == '1':
        msign = -1
    
    # If overflow when the mantissas has been added
    nrm_bit = int("{:025b}".format(mnts)[0],2)
    
    # Shift left the mantissa
    mnts_norm = mnts >> nrm_bit
    mnts_bin = "{:024b}".format(mnts_norm)
    mnts_ = mnts_bin[1:24]
    
    # Adding one to the exponent if mantissa result overflow
    exp_bin = "{:08b}".format(exp + nrm_bit)
    
    # Concatenate exponent and mantissa result
    result = "0" + exp_bin + mnts_
    
    # Return the result in fp32 format
    return msign*bin_fp32(result)

# Seed to set always the same random values
np.random.seed(5)

# Random values
samples = 65536
a = np.random.random(samples) * 100
b = np.random.random(samples) * 100

# Performing the test
errors = np.zeros(len(a),dtype=np.float32)
results_teor = np.zeros(len(a),dtype=np.float32)
results_prct = np.zeros(len(a),dtype=np.float32)
for i in range(len(a)):
    result  = ieee754_addition(a[i],b[i])
    teor = (a[i] + b[i]).astype(np.float32)
    errors[i] = abs(result - teor)
    results_teor[i] = teor
    results_prct[i] = result

print(np.max(errors))