熊猫数据框中的argparse Typeerror，执行不当

如何解决熊猫数据框中的argparse Typeerror，执行不当

我正在尝试创建一个小型实用程序以实现以下目的。我提供了数据框，然后选择三个选项之一。

import argparse
import platform
import os
import pandas as pd
import numpy as np

def version_print():
    print('Expression file is valid...')
    print("Using Python version:",platform.python_version())
    print("Using Pandas version:",pd.__version__)
    print("Using Numpy version:",np.__version__)

def normalize(df,col):
    '''normalize the log table with desired column,Enter the column value in "col".'''
    return df.sub(df[col],axis=0)

parser = argparse.ArgumentParser(description='''Manipulate tables ''',usage='python3 %(prog)s -e *.tsv --options -op *.tsv',epilog='''Short prog. desc:\
    Pass the expression matrix to filter,log2(val) etc.,''')

parser.add_argument("-e","--expr",Metavar='',required=True,help="tab-delimited expression matrix file")
parser.add_argument("-op","--outprefix",help="output file prefix")
parser.add_argument("-l","--log2p5",required=False,help="convert expression values to log2(df+0.5)")
parser.add_argument("-ft","--filter",nargs='?',default=2,type=int,help="Filter table with tpm <= default(2)")
parser.add_argument("-nm","--normalize",nargs=1,type=str,help="normalize table based on column chosen")

args=parser.parse_args()

if (os.path.isfile(args.expr)):
    version_print()
    df = pd.read_csv(args.expr,sep='\t'); print(df.head(3))
    if(args.filter):
        print(args.filter,type(args.filter))
        filtered_df = df[(df[df.columns] >= 2).any(axis='columns')]
        outfile = args.outprefix + ".filteredTpm.gt." +str(args.filter)+".tsv"
        filtered_df.to_csv(outfile,sep='\t',index=False)
        print("Filtered table written to ",outfile)
    elif(args.log2p5):
        log_df = np.log2(df+0.5)
        outfile = args.outprefix + ".log2p5.tsv"
        log_df.to_csv(outfile,index=False)
        print("Converted table into log2p5 and output written to ",outfile)
    elif(args.normalize):
        norm_df = normalize(df,args.normalize)
        outfile = args.outprefix + ".normalized.tsv"
        norm_df.to_csv(outfile,index=False)
        print("normalized table written to ",outfile)
    else:
        print("Provide valid option...")
else:
    print("Please provide proper input..")

执行此操作将显示以下内容：

python tpmtable_utilities.py -h        
usage: python3 tpmtable_utilities.py -e *.tsv --options -op *.tsv                                                                                                            

Manipulate tables

optional arguments:
  -h,--help           show this help message and exit
  -e,--expr          tab-delimited expression matrix file
  -op,--outprefix    output file prefix
  -l,--log2p5        convert expression values to log2(df+0.5)
  -ft [],--filter []  Filter table with tpm <= default(2)
  -nm,--normalize    normalize table based on column chosen

给出数据框时，出现错误：

Expression file is valid...
Using Python version: 3.6.7
Using Pandas version: 1.1.2
Using Numpy version: 1.19.2
  id   c1   c2  c3   c4   c5  c6   c7
0  A  8.3   8.3   5.8   5.3   5.1   5.0   5.6
1  B  8.2   6.2   7.8  14.6   6.1   3.8   5.3
2  C  6.7  12.6  24.3   8.2  30.4  25.1  28.7

TypeError: '>=' not supported between instances of 'str' and 'int'

尽管我给出了python TpmUtilities.py -e table.tsv -op output -l或其他选项（-nm），但仍然出现相同的错误。同样对于-l，我得到了错误：预计会有争论。我猜错误是在args.filter步骤中，我不确定为什么要首先执行，因为它在if循环中。

为什么会这样？预先感谢。

解决方法

正如a comment中已指出的那样，问题在于第36行中的> = 2比较。其根本原因是“ id”列，该列不是数字的，因此无法与整数。

防止这种情况的一种直接方法是将过滤器to numeric中的数组转换为将字符串强制为NaN：

df[df.columns].apply(pd.to_numeric,errors='coerce') >= 2

这样做的缺点是在过滤过程中会丢失id列中的所有信息，因此，如果仍然需要这样做，则需要从过滤器中排除此列。

熊猫数据框中的argparse Typeerror，执行不当

如何解决熊猫数据框中的argparse Typeerror，执行不当

解决方法

相关推荐