如何解决重新格式化和清理带有大括号的花括号的CSV文件
我想重新格式化一个纯ascii文件test.txt,其中包含(仅几百个样本中的10行):
{0.91,0.87,-69.79,-0.3149,0.05},{0.9392,1.089,69,-0.31,0.052},{-0.8768,0.7025,69.80,-0.314,0.053},{0.930,-1.2638750861516,69.79,0.314,0.05301},{0.9367,-1.368063705085268,69.79962,{0.946,-1.644,69.7,0.3,0.052}
到最终文件test_processed.txt,其中包含(对于同一示例):
0.91,0.05
0.9392,0.052
-0.8768,0.053
0.930,0.05301
0.9367,0.052}
0.946,0.052
即纯CSV文件,每一行正好包含原始成对的匹配括号内的五个字段。
我试图摆弄gawk和regex'es,但无法弄清楚该如何处理。我觉得调整awk的变量RS和ORS可能会有所帮助,但是无法前进...
解决方法
使用gnu-awk
,您可以使用RS
使用此awk来匹配{...}
之间的任何内容,然后删除开头{
,结尾}
和换行符:
awk -v RS='{[^}]+}' 'RT{gsub(/^{|}$|\n */,"",RT); print RT}' file
0.91,0.87,-69.79,-0.3149,0.05
0.9392,1.089,69,-0.31,0.052
-0.8768,0.7025,69.80,-0.314,0.053
0.930,-1.2638750861516,69.79,0.314,0.05301
0.9367,-1.368063705085268,69.79962,0.052
0.946,-1.644,69.7,0.3,0.052
工作方式:
-
-v RS='{[^}]+}'
:将记录分隔符设置为{...}
的匹配项 -
RT
:检查RT
是否不为空。RT
被设置为输入的字符串,并由RS
模式匹配。 -
{...}
是awk中的操作块 -
gsub(/^{|}$|\n */,RT)
:从{
中删除开头}
,结尾RT
和换行符,后跟0或多个空格 -
print RT
:打印修改后的RT
请您尝试使用GNU awk
,并使用显示的示例进行编写和测试。
awk -v RS="" -v FS="[}{]" '{for(i=2;i<=NF;i+=2){gsub(/\n+ +/," ",$i);print $i}}' Input_file
输出如下。
0.91,0.052
,
使用GNU awk进行多字符RS和RT:
$ awk -v RS='{[^}]+}' 'RT{$0=RT; gsub(/[{}]/,""); $1=$1; print}' file
0.91,0.052
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。