微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

linux – 即使异步I / O操作挂起,只有线程处理io_service正在等待

Boost的ASIO调度员似乎有一个严重的问题,我似乎无法找到解决方法.症状是,等待分派的唯一线程留在pthread_cond_wait中,尽管有待处理的I / O操作要求它在epoll_wait中阻塞.

我可以通过让一个线程在循环中调用poll_one直到它返回零来轻松复制此问题.这可能会使线程调用运行停留在pthread_cond_wait,而调用poll_one的线程会突破循环.据推测,io_service期望该线程在epoll_wait中返回块,但它没有义务这样做,并且期望似乎是致命的.

是否要求线程与io_services静态关联?

这是一个显示死锁的示例.这是处理此io_service的唯一线程,因为其他人已经继续.肯定有套接字操作待定:

#0 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (...) at /usr/include/boost/asio/detail/posix_event.hpp:80
#2 boost::asio::detail::task_io_service::do_run_one (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:405
#3 boost::asio::detail::task_io_service::run (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:146

我相信错误如下:如果服务于I / O队列的线程是阻塞I / O套接字就绪检查并且调用调度函数的线程,如果在io服务上阻塞了任何其他线程,它必须发出信号.它目前仅表示当时是否有准备好运行的处理程序.但是没有线程检查套接字准备情况.

解决方法

这是一个错误.我已经能够通过在task_io_service :: do_poll_one的非关键部分添加延迟来复制它.以下是 booost/asio/detail/impl/task_io_service.ipp修改后的task_io_service :: do_poll_one()的片段.添加的唯一行是sleep.
std::size_t task_io_service::do_poll_one(mutex::scoped_lock& lock,task_io_service::thread_info& this_thread,const boost::system::error_code& ec)
{
  if (stopped_)
    return 0;

  operation* o = op_queue_.front();
  if (o == &task_operation_)
  {
    op_queue_.pop();
    lock.unlock();

    {
      task_cleanup c = { this,&lock,&this_thread };
      (void)c;

      // Run the task. May throw an exception. Only block if the operation
      // queue is empty and we're not polling,otherwise we want to return
      // as soon as possible.
      task_->run(false,this_thread.private_op_queue);
      boost::this_thread::sleep_for(boost::chrono::seconds(3));
    }

    o = op_queue_.front();
    if (o == &task_operation_)
      return 0;
  }

...

我的测试驱动程序非常基础:

>通过计时器进行异步工作循环,打印“.”每3秒钟一次.
>生成一个将轮询io_service的线程.
>延迟允许新线程时间轮询io_service,并且当poll线程在task_io_service :: do_poll_one()中休眠时,主调用io_service :: run().

测试代码

#include <iostream>

#include <boost/asio/io_service.hpp>
#include <boost/asio/steady_timer.hpp>
#include <boost/chrono.hpp>
#include <boost/thread.hpp>

boost::asio::io_service io_service;
boost::asio::steady_timer timer(io_service);

void arm_timer()
{
  std::cout << ".";
  std::cout.flush();
  timer.expires_from_Now(boost::chrono::seconds(3));
  timer.async_wait(boost::bind(&arm_timer));
}

int main()
{
  // Add asynchronous work loop.
  arm_timer();

  // Spawn poll thread.
  boost::thread poll_thread(
    boost::bind(&boost::asio::io_service::poll,boost::ref(io_service)));

  // Give time for poll thread service reactor.
  boost::this_thread::sleep_for(boost::chrono::seconds(1));

  io_service.run();
}

调试:

[twsansbury@localhost bug]$gdb a.out 
...
(gdb) r
Starting program: /home/twsansbury/dev/bug/a.out 

[Thread debugging using libthread_db enabled]
.[New Thread 0xb7feeb90 (LWP 31892)]
[Thread 0xb7feeb90 (LWP 31892) exited]

此时,arm_timer()已打印“.”曾经(当它被武装起来时). poll线程以非阻塞方式为反应器提供服务,并且在op_queue_为空时睡眠3秒(当task_cleanup c退出范围时,task_operation_将被添加回op_queue_).当op_queue_为空时,主线程调用io_service :: run(),看到op_queue_为空,并使自己成为first_idle_thread_,它在wakeup_event上等待. poll线程完成休眠,并返回0,主线程等待wakeup_event.

等待10秒后,arm_timer()有足够的时间准备就绪,我打断调试器:

Program received signal SIGINT,Interrupt.
0x00919402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00919402 in __kernel_vsyscall ()
#1  0x0081bbc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x00763b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x08059dc2 in void boost::asio::detail::posix_event::wait >(boost::asio::detail::scoped_lock&) ()
#4  0x0805a009 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock&,boost::asio::detail::task_io_service_thread_info&,boost::system::error_code const&) ()
#5  0x0805a11c in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()
#6  0x0805a1e2 in boost::asio::io_service::run() ()
#7  0x0804db78 in main ()

并排时间表如下:

          poll thread                  |          main thread
---------------------------------------+---------------------------------------
  lock()                               | 
  do_poll_one()                        |                          
  |-- pop task_operation_ from         |
  |   queue_op_                        |
  |-- unlock()                         |  lock()
  |-- create task_cleanup              |  do_run_one()
  |-- service reactor (non-block)      |  `-- queue_op_ is empty
  |-- ~task_cleanup()                  |      |-- set thread as idle
  |   |-- lock()                       |      `-- unlock()
  |   `-- queue_op_.push(              |
  |       task_operation_)             |
  `-- task_operation_ is               | 
      queue_op_.front()                |
      `-- return 0                     |  // still waiting on wakeup_event
  unlock()                             |

尽我所知,修补没有副作用:

if (o == &task_operation_)
  return 0;

至:

if (o == &task_operation_)
{
  if (!one_thread_)
    wake_one_thread_and_unlock(lock);
  return 0;
}

无论如何,我已经提交了bug and fix.考虑留意官方回复的机票.

原文地址:https://www.jb51.cc/linux/393210.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐