Unix将可变的命名值解析为单独的行

如何解决Unix将可变的命名值解析为单独的行

我们将获得如下所述的长度可变的输入文件。文字长度不一。

输入文件：

ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

此处的文本已将值对命名为内容，并且长度可变。请注意，文本列中的名称可以包含分号。我们正在尝试解析输入，但是我们无法通过AWK或BASH处理输入

所需的输出：

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

下面的代码片段适用于ID = 2，但不适用于ID = 1

echo "2|name1=value1;name2=value2;name6=;name7=value7;name8=value8" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;done
cat tmp
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

echo "1|name1=value1;name3;name4=value2;name5=value5" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;sed -i "s/^/${id}\|/g" tmp;done
cat tmp
1|name1=value1
1|name3
1|name4=value2
1|name5=value5

非常感谢您的帮助。

解决方法

请您尝试使用新版本的GNU awk来跟踪，编写和测试所示示例。由于OP的awk版本较旧，因此如果有人拥有awk的旧版本，请尝试将其更改为awk --re-interval

awk '
BEGIN{
  FS=OFS="|"
}
FNR==1{ next }
{
  first=$1
  while(match($0,/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
    print first,substr($0,RSTART,RLENGTH)
    $0=substr($0,RSTART+RLENGTH)
  }
}'  Input_file

输出如下。

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

说明： 添加了以上详细说明（以下仅出于说明目的）。

awk '                                        ##Starting awk program from here.
BEGIN{                                       ##Starting BEGIN section from here.
  FS=OFS="|"                                 ##Setting FS and OFS wiht | here.
}
FNR==1{ next }                               ##If line is first line then go next,do not print anything.
{
  first=$1                                   ##Creating first and setting as first field here.
  while(match($0,}=(value[0-9]+)?/)){
##Running while loop which has match which has a regex of matching name and value all mentioned permutations and combinations.
    print first,RLENGTH)    ##Printing first and sub string(currently matched one)
    $0=substr($0,RSTART+RLENGTH)             ##Saving rest of the line into current line.
  }
}' Input_file                                ##Mentioning Input_file name here.

样本数据：

$ cat name.dat
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

一个awk解决方案：

awk -F"[|;]" '                                                           # use "|" and ";" as input field delimiters
FNR==1 { next }                                                          # skip header line
       { pfx=$1 "|"                                                      # set output prefix to field 1 + "|"
         printpfx=1                                                      # set flag to print prefix

         for ( i=2 ; i<=NF ; i++ )                                       # for fields 2 to NF
             {
               if ( printpfx)     { printf "%s",pfx  ; printpfx=0 }   # if print flag == 1 then print prefix and clear flag
               if ( $(i)  ~ /=/ ) { printf "%s\n",$(i) ; printpfx=1 }   # if current field contains "=" then print it,end this line of output,reset print flag == 1
               if ( $(i) !~ /=/ ) { printf "%s;",$(i) }                # if current field does not contain "=" then print it and include a ";" suffix
             }
       }
' name.dat

以上内容生成：

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

Bash解决方案：

#!/usr/bin/env bash

while IFS=\| read -r id text || [ -n "$id" ]; do
  IFS=\; read -r -a kv_arr < <(printf %s "$text")
  printf "$id|%s\\n" "${kv_arr[@]}"
done < <(tail -n +2 a.txt)

一个普通的POSIX shell解决方案：

#!/usr/bin/env sh

# Chop the header line from the input file
tail -n +2 a.txt |
# While reading id and text Fields Separated by vertical bar
while IFS=\| read -r id text || [ -n "$id" ]; do
  # Sets the separator to a semicolon
  IFS=\;
  # Print each semicolon separated field formatted on
  # its own line with the ID
  # shellcheck disable=SC2086 # Explicit split on semicolon
  printf "$id|%s\\n" $text
done

输入a.txt：

ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

输出：

1|name1=value1
1|name3
1|name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

您有一些不错的答案，而且已经被接受。这是一个简短得多的gnu awk命令，它也可以完成这项工作：

body = {"name": "my comment here"}
print(url_api)
header = {'Authorization': 'Bearer ' + token}
print(header)
response = requests.post(url_api,verify=False,headers=header,data=body)
print(response)

awk -F '|' 'NR > 1 {
   for (s=$2; match(s,/([^=]+=[^;]*)(;|$)/,m); s=substr(s,RLENGTH+1))
      print $1 FS m[1]      
}' file.txt