如何将多行的信息拆分成列？

如何解决如何将多行的信息拆分成列？

我有一个包含多行的“.csv”文件。信息设置如下：

GS3;724330300294409;50;BRABT;00147;44504942;01;669063000;25600;0
GS3;724330300294409;50;BRABT;00147;44504943;01;669063000;25600;0
GS3;724330300294409;50;BRABT;00147;44504944;01;669063000;25600;00004

我已经按行接收信息（每个文件有近 300000 行）。我正在将此数据发送到 Kafka，但我需要将行拆分为列。例如：

Column1 Column2         Column3 Column4 Column5 Column6  Column7 Column8    Column9 Column10
GS3     724330300294409 50      BRABT   00147   44504942 01      669063000  25600   0
GS3     724330300294409 50      BRABT   00147   44504943 01      669063000  25600   0
GS3     724330300294409 50      BRABT   00147   44504944 01      669063000  25600   00004

我知道每个值的大小。例如：

3 (GS3)
15 (724330300294409)
2 (50)
5 (BRABT)
5 (00147)
8 (44504943)
2 (01)
10 (669063000)
5 (25600)
5 (0    )

我正在尝试通过我的 Kafka 平台上的 ksql 来做到这一点，但我很挣扎。我是 Python 新手，但在将数据发送到 Kafka 之前，这似乎是一种更简单的方法。

我一直在使用 Spooldir CSV 连接器将数据发送到 Kafka，但每一行都被设置为主题的唯一列。

我用它来添加“;”数据之间：

i = True
        for line in arquivo:
                if i: 
                        i = False
                        continue
                result = result + line[0:3].strip()+commatype+line[3:18].strip()+commatype+line[18:20].strip()+commatype+line[20:25].strip()+ ...

arquivo.close()

解决方法

如果您接受列名称从 Column0（而不是 Column1）开始，您可以使用 sep=';' 和合适的前缀调用 read_csv：

result = pd.read_csv('Input.csv',sep=';',header=None,prefix='Column',dtype='str')

请注意，我通过了 dtype='str' 因为您输入的某些列有前导否则将被剥离的零。

无论输入列的数量如何，此解决方案都有效，但缺点是现在所有列都是 object 类型。也许您应该将某些列转换为其他类型。

结果是：

  Column0          Column1 Column2 Column3 Column4   Column5 Column6    Column7 Column8 Column9
0     GS3  724330300294409      50   BRABT   00147  44504942      01  669063000   25600       0 
1     GS3  724330300294409      50   BRABT   00147  44504943      01  669063000   25600       0 
2     GS3  724330300294409      50   BRABT   00147  44504944      01  669063000   25600   00004

其他选项，根据需要创建列名（从 Column1)，但只有在您知道列数时才有可能：

# Create the list of column names
names = [ f'Column{i}' for i in range(1,11) ]
# Read passing the above column names
result = pd.read_csv('Input.csv',names=names,dtype='str')

如何将多行的信息拆分成列？

如何解决如何将多行的信息拆分成列？

解决方法

相关推荐