微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Perl 线程有时会卡住其中一些在等待 futex

如何解决Perl 线程有时会卡住其中一些在等待 futex

我在尝试运行多线程 Perl 应用程序时遇到问题(在 Redhat 7.4 上使用 Perl 5.10.1)。已经通过下面的设置重现了该问题(设置类似于原来的 Perl 应用程序):

有2个文件main.plmodule1.pm

ma​​in.pl

  • 完成一组任务的顶级脚本,调用为“main.pl
# main.pl
#!/apps/perl/5.10.1/bin/perl

use strict;
use warnings;
use v5.10.1;
use threads;
use threads::shared;

use module1;

use lib "<path to Thread::Queue module>"; 
use Thread::Queue;

my $glNumTests = shift();       # Number of tests(jobs) to run
my $glMaxThreads = shift();     # What is the max allowed number of workers (threads) for queue mode

my $q = Thread::Queue->new();    # A new empty queue

# Worker thread
sub worker
{
    # Thread will loop until no more work
    while (defined(my $item = $q->dequeue())) {
        # Do work on $item
        sleep(5); # dummy pre-work
        $item->dummy(); # Actual task that needs to be run by each thread
    }
}

sub start_threads
{
    my $loNumThreads = shift();
    $loNumThreads = $loNumThreads > $glMaxThreads ? $glMaxThreads : $loNumThreads;
    print "Creating $loNumThreads threads...";
    for ( 1..$loNumThreads )
    {
        threads->create(\&worker);
    }
    print "done\n";
}

sub initialize
{
    my @lotests;
    my $loNumTests = shift();
    for my $i (1..$loNumTests)
    {
        push(@lotests,"Test_".$i);
    }
    return \@lotests;
}

sub launchjobs
{
    my @lotests = @{shift()};
    my $tests = {};
    
    # Create objects
    foreach my $lotest (@lotests)
    {
        $tests->{$lotest}->{"obj"} = module1->new($lotest);
    }
    
    # Start the threads before adding work to the queue
    start_threads(scalar(@lotests));
    
    # Adding work to queue
    foreach my $lotest (@lotests)
    {
        $q->enqueue($tests->{$lotest}->{"obj"});
    }
    # No more work to be added
    $q->end();

    # Wait for threads to finish
    foreach my $thr ( threads -> list() )
    {
        $thr->join();
    }   
}

launchjobs(initialize($glNumTests));

module1.pm

  • 支持模块使用 IPC::Run 模块运行另一个命令(在本例中为 echo)
# module1.pm
package module1;

use lib "<InstallationPath>/IPC-Run-20200505.0/lib";
use IPC::Run qw( run ); 

sub new
{
    my $class = shift();
    my $test = shift();
    my $self = {};
    $self->{"testName"} = $test;
    system("\\mkdir -p test_output/$self->{testName}");
    # Create new file track.log for this object
    open(my $OFH,">","test_output/$self->{testName}/track.log") || die "Cannot open track.log for writing in new\n";
    close($OFH);
    bless($self,$class);
    return $self;
}

sub logTracker
{
    # Writes out the message to track.log file 
    my $self = shift();
    my $message = shift();  
    open(my $OFH,">>","test_output/$self->{testName}/track.log") || die"Cannot open track.log for writing\n";
    print $OFH $message;
    close($OFH);    
}

sub dummy
{
    my $self = shift();
    print "running $self->{testName}\n";

    my $loCmd = "echo"; # Command to be run
    my $loArgs = "This is test $self->{testName}"; # Arguments to the above command
    
    $self->logTracker("Calling run\n");
    run [$loCmd,$loArgs],'>&',"test_output/$self->{testName}/output";
    $self->logTracker("run completed\n");
}
1;

输出目录结构如下: /test_output/Test_/ 并且有两个文件 - track.logoutput 在每个 Test_ 目录

我面临的问题是,有时某些线程会卡在 IPC::run 命令上(track.log 文件不包含“运行完成”行)。

例如,当我上次执行“main.pl 1000 128”(使用 128 个工作器进行 1000 次测试)时,许多线程没有完成并且 main.pl 继续运行(等待)。当我运行 strace 时,我得到以下输出

strace: Process 41187 attached with 8 threads
[pid 42343] read(22,<unfinished ...>
[pid 42292] read(20,<unfinished ...>
[pid 42291] read(25,<unfinished ...>
[pid 42282] read(24,<unfinished ...>
[pid 42234] read(43,<unfinished ...>
[pid 42212] read(18,<unfinished ...>
[pid 41187] futex(0x7ff2597939d0,FUTEX_WAIT,42212,NULL <unfinished ...>
[pid 42338] read(16,

上面运行的进程树如下所示(一旦所有其他线程都完成了):

-perl,41187 main.pl 1000 128
    |-perl,42614 main.pl 1000 128
    |-perl,42615 main.pl 1000 128
    |-perl,42616 main.pl 1000 128
    |-perl,42617 main.pl 1000 128
    |-perl,42618 main.pl 1000 128
    |-perl,42620 main.pl 1000 128
    |-perl,42621 main.pl 1000 128
    |-{perl},42212
    |-{perl},42234
    |-{perl},42282
    |-{perl},42291
    |-{perl},42292
    |-{perl},42338
    `-{perl},42343

我无法理解为什么会发生这种情况(可能与 Threads::Queue 或管道等有关)如果有人能帮我解决这个问题真的很棒。我终于希望能够在没有任何 futex 等待/挂起问题的情况下运行所有​​测试。我试过了:

  • 使用 IPC::Run 的超时选项来检查控件是否返回但这里没有运气。
  • 使用 open3() 而不是 IPC::Run :- 与 IPC::Run 相比,问题仍然存在并且发生的频率更高

注意:

  • 当我使用 Perl 5.16.3 时不会出现这个问题。但是我需要使用 Perl 5.10.1 使原始应用程序工作,因此如果有人能帮助我理解如何解决这个问题,那就太好了:)
  • 在 module1::dummy() 中,如果将 $loCmd 更改为“sleep”并将 $loArgs 更改为(例如)“10”,则问题不会出现(给人的印象是它可能与管道/IO 缓冲区)。
  • 卡住的线程数会因不同的运行而异,有时所有线程都可以正常完成(可能存在竞争条件)。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。