信息安全研究 ›› 2018, Vol. 4 ›› Issue (2): 180-184.

• 技术应用 • 上一篇    

基于OpenStack的累积型失效检测方法P-FD的研究和应用

李雪莉   

  1. 四川大学计算机学院
  • 收稿日期:2018-02-25 出版日期:2018-02-15 发布日期:2018-02-25
  • 通讯作者: 李雪莉
  • 作者简介:李雪莉,1992年生,四川大学软件工程硕士,主要研究信息安全。

Study and Application of accrual failure detector algorithm based on OpenStack

  • Received:2018-02-25 Online:2018-02-15 Published:2018-02-25

摘要: OpenStack是最受欢迎和广泛使用的开源云计算平台之一,在OpenStack中实现高可用是一个必需的技术。而由于软件或硬件不可靠性导致的节点失效,是阻碍实现高可用的主要问题。在OpenStack中,主要通过传统的发送心跳,设定固定超时时间的方式进行失效检测,但是这种方法存在误判问题,如只是网络延迟,节点并未失效,也会成心跳过期,造成误判而对虚拟机进行迁移,付出不必要的极大代价。本文提出了一种基于正太分布的累积型失效检测方法P-FD,根据时间累积变化计算怀疑度,同时除了阈值外还采用Pull心跳模式结合实现二次确认策略,减少了误判率。相对于OpenStack现有失效检测方法,它能够更好的适应OpenStack这种大型云计算模式中动态且不稳定的网络环境。实验结果表明在网络不稳定的情况下,该检测方法降低了误判率,有效提高了系统可用性。

关键词: 云计算, OpenStack, 正太分布, Pull心跳模式, 累积型失效检测

Abstract: OpenStack is one of the most popular widely used open source cloud computing platforms, and it’s a required skill to achieve high availability in OpenStack. Unfortunately, due to unrealiability in hardware and software, node failure has been a major obstacle to high availability in cloud computing. In OpenStack, the failure detector is implemented by conventional heart-beat message sent by host and a timeout set to detect the failure. However, it’s unreliable because the timeout can be caused by network delay, which will affect the decision whether or not to evacuate the VMs, thus paying an unecessary and enormous price. This paper makes a research for accrual failure detector based on normal distribution of heartbeats, named P-FD. It calculates a continuous time-related value to represent the suspicion level of the monitored process. Moreover, it achieves double-check mechanism by combining pull heartbeat mode and threshold,which lowers the mistake rates. Corresponding to failure detector in OpenStack, it acts better in large-scaled and complicated cloud computing environment. The experiment shows that this failure detector greatly reduces the mistake rates and improves system availability.

Key words: Cloud Computing, OpenStack, Normal distribution, Pull Mode, Accrual Failure Detector