微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Unix将可变的命名值解析为单独的行

如何解决Unix将可变的命名值解析为单独的行

我们将获得如下所述的长度可变的输入文件文字长度不一。

输入文件

ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

此处的文本已将值对命名为内容,并且长度可变。请注意,文本列中的名称可以包含分号。我们正在尝试解析输入,但是我们无法通过AWK或BASH处理输入

所需的输出

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

下面的代码片段适用于ID = 2,但不适用于ID = 1

echo "2|name1=value1;name2=value2;name6=;name7=value7;name8=value8" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;done
cat tmp
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
echo "1|name1=value1;name3;name4=value2;name5=value5" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;sed -i "s/^/${id}\|/g" tmp;done
cat tmp
1|name1=value1
1|name3
1|name4=value2
1|name5=value5

非常感谢您的帮助。

解决方法

请您尝试使用新版本的GNU awk来跟踪,编写和测试所示示例。由于OP的awk版本较旧,因此如果有人拥有awk的旧版本,请尝试将其更改为awk --re-interval

awk '
BEGIN{
  FS=OFS="|"
}
FNR==1{ next }
{
  first=$1
  while(match($0,/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
    print first,substr($0,RSTART,RLENGTH)
    $0=substr($0,RSTART+RLENGTH)
  }
}'  Input_file

输出如下。

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

说明: 添加了以上详细说明(以下仅出于说明目的)。

awk '                                        ##Starting awk program from here.
BEGIN{                                       ##Starting BEGIN section from here.
  FS=OFS="|"                                 ##Setting FS and OFS wiht | here.
}
FNR==1{ next }                               ##If line is first line then go next,do not print anything.
{
  first=$1                                   ##Creating first and setting as first field here.
  while(match($0,}=(value[0-9]+)?/)){
##Running while loop which has match which has a regex of matching name and value all mentioned permutations and combinations.
    print first,RLENGTH)    ##Printing first and sub string(currently matched one)
    $0=substr($0,RSTART+RLENGTH)             ##Saving rest of the line into current line.
  }
}' Input_file                                ##Mentioning Input_file name here.
,

样本数据:

$ cat name.dat
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

一个awk解决方案:

awk -F"[|;]" '                                                           # use "|" and ";" as input field delimiters
FNR==1 { next }                                                          # skip header line
       { pfx=$1 "|"                                                      # set output prefix to field 1 + "|"
         printpfx=1                                                      # set flag to print prefix

         for ( i=2 ; i<=NF ; i++ )                                       # for fields 2 to NF
             {
               if ( printpfx)     { printf "%s",pfx  ; printpfx=0 }   # if print flag == 1 then print prefix and clear flag
               if ( $(i)  ~ /=/ ) { printf "%s\n",$(i) ; printpfx=1 }   # if current field contains "=" then print it,end this line of output,reset print flag == 1
               if ( $(i) !~ /=/ ) { printf "%s;",$(i) }                # if current field does not contain "=" then print it and include a ";" suffix
             }
       }
' name.dat

以上内容生成:

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
,

Bash解决方案:

#!/usr/bin/env bash

while IFS=\| read -r id text || [ -n "$id" ]; do
  IFS=\; read -r -a kv_arr < <(printf %s "$text")
  printf "$id|%s\\n" "${kv_arr[@]}"
done < <(tail -n +2 a.txt)

一个普通的POSIX shell解决方案:

#!/usr/bin/env sh

# Chop the header line from the input file
tail -n +2 a.txt |
# While reading id and text Fields Separated by vertical bar
while IFS=\| read -r id text || [ -n "$id" ]; do
  # Sets the separator to a semicolon
  IFS=\;
  # Print each semicolon separated field formatted on
  # its own line with the ID
  # shellcheck disable=SC2086 # Explicit split on semicolon
  printf "$id|%s\\n" $text
done

输入a.txt

ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

输出:

1|name1=value1
1|name3
1|name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
,

您有一些不错的答案,而且已经被接受。这是一个简短得多的gnu awk命令,它也可以完成这项工作:

body = {"name": "my comment here"}
print(url_api)
header = {'Authorization': 'Bearer ' + token}
print(header)
response = requests.post(url_api,verify=False,headers=header,data=body)
print(response)
awk -F '|' 'NR > 1 {
   for (s=$2; match(s,/([^=]+=[^;]*)(;|$)/,m); s=substr(s,RLENGTH+1))
      print $1 FS m[1]      
}' file.txt

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。