微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

重新格式化和清理带有大括号的花括号的CSV文件

如何解决重新格式化和清理带有大括号的花括号的CSV文件

我想重新格式化一个纯ascii文件test.txt,其中包含(仅几百个样本中的10行):

{0.91,0.87,-69.79,-0.3149,0.05},{0.9392,1.089,69,-0.31,0.052},{-0.8768,0.7025,69.80,-0.314,0.053},{0.930,-1.2638750861516,69.79,0.314,0.05301},{0.9367,-1.368063705085268,69.79962,{0.946,-1.644,69.7,0.3,0.052}

到最终文件test_processed.txt,其中包含(对于同一示例):

0.91,0.05 
0.9392,0.052 
-0.8768,0.053 
0.930,0.05301 
0.9367,0.052} 
0.946,0.052

即纯CSV文件,每一行正好包含原始成对的匹配括号内的五个字段。

我试图摆弄gawk和regex'es,但无法弄清楚该如何处理。我觉得调整awk的变量RS和ORS可能会有所帮助,但是无法前进...

解决方法

使用gnu-awk,您可以使用RS使用此awk来匹配{...}之间的任何内容,然后删除开头{,结尾}和换行符:

awk -v RS='{[^}]+}' 'RT{gsub(/^{|}$|\n */,"",RT); print RT}' file
0.91,0.87,-69.79,-0.3149,0.05
0.9392,1.089,69,-0.31,0.052
-0.8768,0.7025,69.80,-0.314,0.053
0.930,-1.2638750861516,69.79,0.314,0.05301
0.9367,-1.368063705085268,69.79962,0.052
0.946,-1.644,69.7,0.3,0.052

工作方式:

  • -v RS='{[^}]+}':将记录分隔符设置为{...}的匹配项
  • RT:检查RT是否不为空。 RT被设置为输入的字符串,并由RS模式匹配。
  • {...}是awk中的操作块
  • gsub(/^{|}$|\n */,RT):从{中删除开头},结尾RT和换行符,后跟0或多个空格
  • print RT:打印修改后的RT
,

请您尝试使用GNU awk,并使用显示的示例进行编写和测试。

awk -v RS="" -v FS="[}{]" '{for(i=2;i<=NF;i+=2){gsub(/\n+ +/," ",$i);print $i}}' Input_file

输出如下。

0.91,0.052
,

使用GNU awk进行多字符RS和RT:

$ awk -v RS='{[^}]+}' 'RT{$0=RT; gsub(/[{}]/,""); $1=$1; print}' file
0.91,0.052

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。