shell – 如何区分部分线条？

我有两个我想要差异的文件.这些行有时间戳,可能还有其他一些我想忽略的匹配算法,但如果匹配算法在文本的其余部分找到差异,我仍然希望输出这些项目.例如：

1c1
<    [junit4] 2013-01-11 04:43:57,392 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
---
>    [junit4] 2013-01-11 22:16:07,398 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)

不应该被发射但是：

1c1
<    [junit4] 2013-01-11 04:43:57,398 INFO  com.example.MyClass:456 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)

应该被发射(因为行号不同).请注意,仍会发出时间戳.

如何才能做到这一点？

解决方法

我希望这个功能在我之前好几次,并且因为它再次出现在这里我决定谷歌一点,找到perl的Algorithm :: Diff,你可以提供一个散列函数(他们称之为“密钥生成函数”)哪个“应该返回一个唯一标识给定元素的字符串”,算法用来进行比较(而不是你用它来提供的实际内容).

基本上,你需要做的就是添加一个以一种你希望从你的字符串中过滤掉不需要的东西的方式执行一些正则表达式魔法的子程序,并将subref作为参数添加到对diff()的调用中(参见我的CHANGE 1和CHANGE)下面的代码段中的2条评论).

如果您需要正常(或统一)diff输出,请检查模块附带的精心设计的diffnew.pl示例,并在此文件中进行必要的更改.出于演示目的,我将使用它附带的简单diff.pl,因为它很短,我可以在这里完全发布它.

mydiff.pl

#!/usr/bin/perl

# based on diff.pl that ships with Algorithm::Diff
# demonstrates the use of a key generation function

# the original diff.pl is:
# copyright 1998 M-J. Dominus. (mjd-perl-diff@plover.com)
# This program is free software; you can redistribute it and/or modify it
# under the same terms as Perl itself.

use Algorithm::Diff qw(diff);

die("Usage: $0 file1 file2") unless @ARGV == 2;

my ($file1,$file2) = @ARGV;

-f $file1 or die("$file1: not a regular file");
-f $file2 or die("$file2: not a regular file");
-T $file1 or die("$file1: binary file");
-T $file2 or die("$file2: binary file");

open (F1,$file1) or die("Couldn't open $file1: $!");
open (F2,$file2) or die("Couldn't open $file2: $!");
chomp(@f1 = <F1>);
close F1;
chomp(@f2 = <F2>);
close F2;

# CHANGE 1
# $diffs = diff(\@f1,\@f2);
$diffs = diff(\@f1,\@f2,\&keyfunc);

exit 0 unless @$diffs;

foreach $chunk (@$diffs)
{
        foreach $line (@$chunk)
        {
                my ($sign,$lineno,$text) = @$line;
                printf "%4d$sign %s\n",$lineno+1,$text;
        }
}
exit 1;

# CHANGE 2 {
sub keyfunc
{
        my $_ = shift;
        s/^(\d{2}:\d{2})\s+//;
        return $_;
}
# }

此时就把one.txt存盘

12:15 one two three
13:21 three four five

two.txt

10:01 one two three
14:38 seven six eight

示例运行

$./mydiff.pl one.txt two.txt
   2- 13:21 three four five
   2+ 13:21 seven six eight

示例运行2

这里有一个基于diffnew.pl的普通diff输出

$./my_diffnew.pl one.txt two.txt
2c2
< 13:21 three four five
---
> 13:21 seven six eight

如您所见,任一文件中的第一行都会被忽略,因为它们的时间戳只有不同,散列函数会删除这些行以进行比较.

Voilà,你只是推出了自己的内容感知差异！

shell – 如何区分部分线条？

解决方法

相关推荐