微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何跳过awk中的目录?

假设我有以下文件和目录结构:
$tree
.
├── a
├── b
└── dir
    └── c

1 directory,3 files

也就是说,两个文件a和b与dir dir一起,其中另一个文件c代表.

我想用awk(完全是GNU Awk 4.1.1)处理所有文件,所以我这样做:

$gawk '{print FILENAME; nextfile}' * */*
a
b
awk: cmd. line:1: warning: command line argument `dir' is a directory: skipped
dir/c

一切都很好,但*也扩展到目录目录,awk尝试处理它.

所以我想知道:是否有任何本地方式awk可以检查给定元素是否是一个文件,如果是,跳过它?也就是说,不使用system().

我通过在BEGINFILE调用外部系统使其工作:

$gawk 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME,"is a dir,skipping"; nextfile}} ENDFILE{print FILENAME,FNR}' * */*
a
a 10
a.wk
a.wk 3
b
b 10
dir
dir is a dir,skipping
dir/c
dir/c 10

还要注意if(system(“[!-d”FILENAME“]”)){print FILENAME,“是dir,skip”“; nextfile}直观地工作:它应该在true时返回1,但它返回退出代码.

我在A.5 Extensions in gawk Not in POSIX awk读到:

  • Directories on the command line produce a warning and are skipped (see 07002)

然后链接页面说:

4.11 Directories on the Command Line

According to the POSIX standard,files named on the awk command line
must be text files; it is a Fatal error if they are not. Most versions
of awk treat a directory on the command line as a Fatal error.

By default,gawk produces a warning for a directory on the command
line,but otherwise ignores it. This makes it easier to use shell
wildcards with your awk program:

06003

If either of the –posix or –Traditional options is given,then gawk
reverts to treating a directory on the command line as a Fatal error.

See 07003,for a way to treat directories as usable
data from an awk program.

事实上情况就是这样:与之前相同的命令–posix失败:

$gawk --posix 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME,NR}' * */*
gawk: cmd. line:1: fatal: cannot open file `dir' for reading (Is a directory)

我检查了上面链接的16.7.6阅读目录部分,他们讨论了readdir:

The readdir extension adds an input parser for directories. The usage
is as follows:

@load “readdir”

但我不确定如何调用它以及如何从命令行使用它.

解决方法

如果您想保护您的脚本免受其他人错误地将目录(或其他任何不可读的文本文件)传递给它,您可以这样做:
$ls -F tmp
bar  dir/  foo

$cat tmp/foo
line 1

$cat tmp/bar
line 1
line 2

$cat tmp/dir
cat: tmp/dir: Is a directory

$cat tst.awk
BEGIN {
    for (i=1;i<ARGC;i++) {
        if ( (getline line < ARGV[i]) <= 0 ) {
            print "Skipping:",ARGV[i],ERRNO
            delete ARGV[i]
        }
        close(ARGV[i])
    }
}
{ print FILENAME,$0 }

$awk -f tst.awk tmp/*
Skipping: tmp/dir Is a directory
tmp/bar line 1
tmp/bar line 2
tmp/foo line 1

$awk --posix -f tst.awk tmp/*
Skipping: tmp/dir
tmp/bar line 1
tmp/bar line 2
tmp/foo line 1

如果/当它尝试从文件中检索记录失败时,每个POSIX getline返回-1(例如,不可读的文件文件不存在或文件是目录),你只需要GNU awk告诉你它是哪个失败的如果你关心的话,ERRNO的价值.

原文地址:https://www.jb51.cc/linux/393000.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐