Multi-Gigabit Pattern Matching For Packet Assesment in Network Security

IS J AA
International Journal of Systems , Algorithms & Applications
MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

N.Sagar1, G.V. Ravi Kumar2
1M.Tech in Embedded systems, 2Assistant Professor, Dept. of ECE 1 Bharat Institute of Engineering and Technology, Hyderabad 2St. Johns College of Engineering & Technology, Yemmiganur
email: 1snayakanti@gmail.com, 2ravikumar.gv@gmail.com

Abstract - In the current scenario network security is emerging the world. Matching large sets of patterns against an incoming stream of data is a fundamental task in several fields such as network security or computational biology. High-speed network intrusion detection systems (IDS) rely on efficient pattern matching techniques to analyze the packet payload and make decisions on the significance of the packet body. However, matching the streaming payload bytes against thousands of patterns at multi-gigabit rates is computationally intensive. Various techniques have been proposed in past but the performance of the system is reducing because of multi-gigabit rates. Pattern matching is a significant issue in intrusion detection systems, but by no means the only one. Handling multi-content rules, reordering, and reassembling incoming packets are also significant for system performance. We present two pattern matching techniques to compare incoming packets against intrusion detection search patterns. The first approach, decoded partial CAM (DpCAM), pre-decodes incoming characters, aligns the decoded data, and performs logical AND on them to produce the match signal for each pattern. The second approach, perfect hashing memory (PHmem), uses perfect hashing to determine a unique memory location that contains the search pattern and a comparison between incoming data and memory output to determine the match. The suggested methods have implemented in VHDL coding and we use Xilinx for synthesis. Keywords : DCAM, DpCAM ,Packet inspection, pattern matching, perfect hashing.
provide higher flexibility and comparable to ASICs performance. Generally, the performance results of FPGA systems are promising, showing that FPGAs can be used to support the increasing needs for network security. FPGAs are flexible, reconfigurable, provide hardware speed, and therefore, are suitable for implementing such systems. On the other hand, there are several issues that should be faced. Large designs are complex and therefore hard to operate at high frequency. Additionally, matching a large number of patterns has high area cost, so sharing logic is critical, since it could save a significant amount of resources, and make designs smaller and faster. Furthermore, the performance of such designs is promising and indicates that FPGAs can be used to support the increasing needs for high-speed network security. Pattern matching is a significant issue in intrusion detection systems, but by no means the only one. Handling multi content rules, reordering, and reassembling incoming packets are also significant for system performance. We present two efficient pattern matching techniques to analyze packet payloads at multi gigabit rates and detect hazardous contents. The first one is Decoded CAM (DCAM) and uses pre-decoding to exploit pattern similarities and reduce the area cost of the designs. We improve DCAM and decrease the required logic resources by partially matching long patterns. The improved approach is denoted as decoded partial CAM (DpCAM). The second approach perfect hashing memory (PHmem), combines logic and memory for the matching. PHmem utilizes a new perfect hashing technique to hash the incoming data and determine a unique memory location of a possible matching pattern. Subsequently, we read this pattern from memory and compare it against the incoming data. We extend the perfect hashing algorithm in order to guarantee that for any given set a perfect hash function can be generated, and present a theoretical proof of its correctness. The rest of the paper is organized as follows: In Section II, we discuss related work. In Section III, we describe our initial Discrete Comparator approaches, DCAM, DpCAM architectures respectively. In Section IV, perfect hashing is introduced, In Section V, we present the implementation results of both DpCAM and
I. INTRODUCTION Matching large sets of patterns against an incoming stream of data is a fundamental task in several fields such as network security or computational biology. For example, high-speed network intrusion detection systems (IDS) rely on efficient pattern matching techniques to analyze the packet payload and make decisions on the significance of the packet body. However, matching the streaming payload bytes against thousands of patterns at multi-gigabit rates is computationally intensive. On the other hand, hardware-based solutions can significantly increase performance and achieve higher throughput. Many hardware units have been proposed for IDS pattern matching most of them in the area of reconfigurable hardware. In general, fieldprogrammable gate arrays (FPGAs) are well suited for this task, since designs can be customized for a particular set of search patterns and updates to that set can be performed via reconfiguration. FPGA-based systems
Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677
IS J AA
PHmem and compare them with related work. Finally, in Section VI, conclusions are made. II. LITERATURE SURVEY This chapter includes a brief description of pattern matching in software NIDS solutions and hardwarebased pattern matching architectures in NIDS. A. PATTERN MATCHING IN SOFTWARE NIDS SOLUTIONS NIDS scan packet payloads with signatures to detect malicious intrusions. This is a string matching procedure. The most well-known software-based algorithms are: Knuth-Morris-Pratt (KMP), Boyer-Moore (BM), Aho-Corasick (AC) and Commentz-Walter (CW). The KMP and BM algorithms are designed for single pattern searching. If the pattern length is m bytes, then it will take O(m+n) time to finish the search in an n bytes packet. If there are k patterns, the search time will be O (k (m+n)); linearly to k. String matching module using BM algorithm is called in current SNORT system. The AC and CW algorithms are designed for multipattern matching by pre-processing the patterns and building a finite automaton which can process an input packet with n bytes length in O (n) time. However, the exponential state explosion cost too much space. On the other hand, Aho-Corasick (AC) is a multiple pattern string matching algorithm, meaning it matches the input against multiple patterns at the same time. Multiple pattern string matching algorithms generally preprocess the set of patterns, and then search all of them together over the packet content. AC is more suitable for hardware implementation because it has a deterministic execution time per packet. They concluded that their compressed version of AC is the best choice for hardware implementation of pattern matching for NIDS. B. HARDWARE-BASED PATTERN MATCHING ARCHITECTURES IN NIDS In the past few years, numerous hardware-based pattern matching solutions have been proposed, most of them using FPGAs, finite automata or hashing approaches. Next, we describe some significant steps forward in IDS pattern matching over the past few years. Simple CAM or discrete comparators structures offer high performance, at high area cost. Using regular expressions (NFAs and DFAs) for pattern matching slightly reduces the area requirements, however, results in significantly lower performance. A technique to substantially increase sharing of character comparators and reduce the design cost is predecoding, applicable to both regular expression and CAM-like approaches. The main idea is that incoming characters are predecoded resulting in each unique character being represented by a single wire. This way, an Ncharacter comparator is reduced to an N -input AND gate. Yusuf and Luk presented a tree-based
CAM structure, representing multiple patterns as a Boolean expression in the form of a binary decision diagram (BDD). In doing so, the area cost is lower than other CAM and NFA approaches. Given the processing bandwidth limitations of General purpose processors (GPP), which can serve only a few hundred Mbps throughput, Hardware-based NIDS (Multicore Processors, ASIC or FPGA) as illustrated in Fig. 1 is an attractive alternative solution.
Fig. 1. Abstract illustration of performance and area efficiency for various hardware pattern matching techniques
III. DECODED CAMS Simple CAM or discrete comparators may provide high performance; however, they are not scalable due to their high area cost. We assumed the simple organization depicted in Fig 2(a). The input stream is inserted in a shift register, and the individual entries are fanned out to the pattern comparators. There is one comparator for each pattern, fed from the shift register. This design is simple and regular, and with proper use of pipelining, the circuit can be fast. Its drawback, however, is the high area cost. To remedy this cost, we suggested sharing the character comparators exploiting similarities between patterns as shown in Fig 2(b).
Fig 2. Basic discrete comparator structure and its optimized version which shares common character comparators.
The Decoded CAM architecture illustrated in Fig 3, builds on this idea extending it further by the following observation: instead of keeping a window of input characters in the shift register each of which is compared against multiple search patterns, we can first test for equality of the input for the desired characters, and then delay the partial matching signals. This approach both shares the equality logic for character comparators and
2
IS J AA
replaces the 8-bit wide shift registers used in our initial approach with single bit shift registers for the equality result(s). If we exploit this advantage, the potential for area savings is significant. In practice, about 5 less area resources are required compared to simple CAM and discrete comparators designs.
Fig 4. DpCAM: Partial matching of long patterns. In this example, a 31-byte pattern is matched. The first 16 bytes are partially matched and the result is properly delayed to feed the second substring comparator. Both substring comparators are fed from the same pool of shifted decoded characters (SRL16s) and therefore sharing of decoded characters is higher.
Fig 3. Decoded CAM: Three comparators provide the equality signals for characters A, B, and C (A is shared). To match pattern ABCA we have to remember (using shift registers) the matching of character A, B, C, for 3, 2, and 1 cycles, respectively, until the final character is matched.
Long patterns are partially matched in substrings of maximally 16 characters long. The reason is that the AND-tree of a 16 character substring needs only five LUTs, while only a single SRL16 shift register is required to delay each decoded input character. Consequently, a pattern longer than 16 characters is partitioned in smaller substrings which are matched separately. The partial match of each substring is properly delayed and provides input to the AND-tree of the next substring.
One of the possible shortcomings of our approach is that the number of the single bit shift registers is proportional to the length of the patterns. Fig 3 illustrates this point: to match a four-character long pattern, we need to test equality for each character (in the dashed decoder block), and to delay the matching of the first character by three cycles, the matching of the second character by two cycles, and so on, for the width of the search pattern. In total, the number of storage elements required in this approach is for a string of length. For many long patterns this number can exceed the number of bits in the character shift register used in the original CAM design. To our advantage, however, is that these shift registers are true first-inputfirst-outputs (FIFOs) with one input and one output, as opposed to the shift registers in the simple design in which each entry in the shift register is fan-out to comparators. To tackle this possible obstacle, we use two techniques. First, we reduce the number of shift registers by sharing their outputs whenever the same character is used in the same position in multiple search patterns. Second, we use the SRL16 optimized implementation of shift register that is available in Xilinx devices and uses a single logic cell for a shift register. Together these two optimizations lead to significant area savings. To further reduce the area cost of our designs, we split long patterns in smaller substrings and match each substring separately. This improved version of DCAM is denoted as DpCAM (decoded partial CAM). Fig 4 depicts the block diagram of matching patterns longer than 16 characters.
Fig 5. DpCAM processing two characters per cycle.
In order to achieve better performance, we use techniques to improve the operating frequency, as well as the throughput of our DpCAM implementation. To increase the processing throughput, we use parallelism. We widen the distribution paths by a factor of P providing P copies of comparators (decoders) and the corresponding matching gates. Fig 5 illustrates this point for P = 2. To achieve high operating frequency, we use extensive finegrain pipelining. The latency of the pipeline depends on the pattern length and in practice is a few tens of cycles, which translates to a few hundreds of nanoseconds and is acceptable for such systems. IV. PERFECT HASHING MEMORY The alternative pattern matching approach proposed in this paper is the PHmem. Instead of matching each pattern separately, it is more efficient to utilize a hash module to determine which pattern is a possible match, read this pattern from a memory and compare it against the incoming data. Hardware hashing for pattern matching is a technique known for decades.
IS J AA
H(A) = h0 (H1 (1st half of A), H2 (2nd half of A)) (1) st st H1 (1 half of A) = h1 (H1.1(1 quarter of A), H1.2(2nd quarter of A)) (2) and so on for the smaller subsets of the set A (until each subset contains a single element). The h0, h1 etc., are functions that combine subhashes. The H1, H2, H1.1, H1.2 etc., are perfect hashes of subsets (subhashes).
Fig 6. PHmem block diagram.
Fig 6 depicts our PHmem scheme. The incoming packet data are shifted into a serial-in parallel-out shift register. The parallel-out lines of the shift register provide input to the comparator which is also fed by the memory that stores the patterns. Selected bit positions of the shifted incoming data are used as input to a hash module, which outputs the ID of the possible match pattern. For memory utilization reasons, we do not use this pattern ID to directly read the search pattern from the pattern memory. We utilize instead an indirection memory. The indirection memory outputs the actual location of the pattern in the pattern memory and its length that is used to determine which bytes of the pattern memory and the incoming data are needed to be compared. In our case, the indirection memory performs a 1to-1 instead of the N-to-1 mapping, since the output address has the same width (number of bits) as the pattern ID. Finally, it is worth noting that the implementation of the hash tree and the memories are pipelined. Consequently, the incoming bit stream must be buffered by the same amount of pipeline stages in order to correctly align it for comparison with the chosen pattern from the pattern memory. A. PERFECT HASHING TREE The proposed scheme requires the hash function to generate a different address for each pattern, in other words, requires a perfect hash function which has no collisions for a given set of patterns. Furthermore, the address space would preferably be minimal and equal to the number of patterns. Instead of matching unique pattern prefixes, we hash unique substrings in order to distinguish the patterns. To do so, we introduce a perfect hashing method to guarantee that no collisions will occur for a given set. Generating such a perfect hash function may be difficult and time consuming. In our approach, instead of searching for a single hash function, we search for multiple simpler sub hashes that when put together in a tree-like structure will construct a perfect hash function. The perfect hash tree, is created based on the idea of divide and conquer. Let A be a set of unique\ sub-strings ={a1, a2.an} and H (A) a perfect hash function of A, then the perfect hash tree is created according to the following equations:
Following the previously discussed methodology, we create a binary hash tree. For a given set of n patterns that have unique substrings, we consider the set of substrings as an nxm matrix A. Each row of the matrix A (m bits long) represents a substring, which differs at least in one bit from all the other rows. Each column of the matrix A( n bits long) represents a different bit position of the substrings. The perfect hash tree should have log2(n) output bits in order to be minimal. We construct the tree by recursively partitioning the given matrix as follows. Search for a function (e.g., h0) that separates the matrix A in two parts (e.g., A0, A1), which can be encoded in log2(n) 1 bits. Recursively repeat the procedure for each part of the matrix, in order to separate them again in smaller parts. The process terminates when all parts contain one row. To prove that our method generates perfect hash functions, we need to prove the following. For any given set A of n items that can be encoded in log2(n) bits, our method generates a function h:A{0,1} to split the set in two subsets that can be encoded in log2(n/2) bits (that is log2(n) 1 bits). Based on the first proof, the proposed scheme outputs a perfect hash function for the initial set of patterns. Proof: By definition, a hash function H|A of set A = {a1, a2.an} which outputs a different value for each element ai is perfect H |a1 H |a2 . H |ax . (3) Also, if h|S , where S = A U B U . U N and A B . N = V is a hash function that separates the n subsets A,B,N having a different output for elements of different subsets is also perfect, that is h |A h |B . h |N . (4) We construct our hash trees based on two facts. First, the selects of the multiplexers h separate perfectly the subsets of the node. Second, that the inputs of the leaf nodes are perfect hash functions; this is given by the fact that each element differs to any other element at least one bit, therefore, there exists a single bit that separates (perfectly) any pair of elements in the set. Conse4
IS J AA
quently, it must be proven that a node which combines the outputs of perfect hash functions HA,HB,..HN of the subsets A,B,..N using a perfect hash function h |S which separates these subsets, outputs also a perfect hash function Hnode for the entire set S. The output of the node is the following: Hnode = h|S2 IF(h|S = h|A) THEN HA ELSE IF (h|S = h|B) THEN HB ELSE . . . IF (h|S = h|N) THEN HN ELSE
Fig 10. FPGA module of PHmem
Consequently, the Hnode outputs different values for either two entries of the same subset Hnode|aiHnode|aj based on (3), or for two entries of different subsets Hnode|ai Hnode|bj based on (4). Therefore, each tree node and also the entire hash tree output perfect hash functions. V. EXPERIMENTAL RESULTS & ANALYSIS DpCAM and PHmem structures are simulated by active-HDL, synthesized in Xilinx and their FPGA module are given.
Table 1 Comparison of FPGA-based pattern matching approaches

Description Input in Bits Family Device Logic Cells Total Delay (ns) BCD CCC DCAM DpCAM 8 8 8 Vertex E Vertex E Vertex E Xcv50e8CS144 Xcv50e8CS144 Xcv50e8CS144 14 14 178 9.341 9.341 5.857
16 16
Vertex E Xcv50e8CS144 Vertex E Xcv50e8CS144
346 269
6.991 5.783
PHmem
Fig 7. Simulated results of DpCAM
Implementation of designs using Vertex-E devices with -8 speed grade, designs that process 8, 16 bits/ cycle, which were implemented in an Xcv50e8-CS144. Table 1, contains comparison of FPGA-based pattern matching approaches. When compared with the delay time between implemented designs, PHmem is better because it has less delay time.
Fig 8. FPGA module of DpCAM
VI. CONCLUSIOS Network intrusion detection and prevention systems have become an essential part of the Internet infrastructure. An IDPS has two primary purposes. First, it must detect all potentially damaging events that can occur in a system. Second, it must prevent the harmful activity from being performed by blocking the flow of harmful traffic into the system. We describe two reconfigurable pattern matching approaches, suitable for intrusion detection. The first one (DpCAM) uses only logic and the pre-decoding technique to share resources. Pre-decoding is a very effective method to share logic between different pattern comparators (either in discrete comparators or regular expressions). The second one (PHmem) requires both memory and logic, employing an in practice simple and compact hash function to access the pattern memory. The proposed PHmem algorithm guarantees the generation of a perfect hash function for any given set of patterns. Porting these architectures to newer generation FPGAs to take advantage of additional resources and benefit from faster clock frequencies is a logical next step for expanding these designs. Systems with such architectures can significantly outperform existing commercially available systems.
Fig 9. Simulated results of PHmem
IS J AA
REFERENCES
[1] I. Sourdis and D. Pnevmatikatos, Fast, large-scale string match for a 10 Gbps FPGA-based network intrusion detection system, in Proc. Int. Conf. Field Program. Logic Appl,2003. [2] I. Sourdis and D. Pnevmatikatos, Pre-decoded CAMs for efficient and high-speed NIDS pattern matching, in Proc. IEEE Symp. Field-Program. Custom Comput. Mach., 2004. [3] M. Gokhale, D. Dubois, A. Dubois, M. Boorman, S. Poole, and V. Hogsett, Granidt: Towards gigabit rate network intrusion detection technology, in Proc. Int. Conf. Field Program. Logic Appl., 2002. [4] Z. K. Baker and V. K. Prasanna, A methodology for synthesis of efficient intrusion detection systems on FPGAs, in Proc. IEEE Symp. Field-Program. Custom Comput. Mach., 2004, pp. 135144. [5] C. R. Clark and D. E. Schimmel, Scalable parallel patternmatching on high-speed networks, in Proc. IEEE Symp. FieldProgram. Custom Comput. Mach., 2004, pp. 249257.
[6]Y. H. Cho, S. Navab, and W. Mangione-Smith, Specialized hardware for deep network packet filtering, in Proc. 12th Int. Conf. Field Program.Logic Appl., 2002, pp. 452461. [7] Z. K. Baker and V. K. Prasanna, Automatic synthesis of efficient intrusion detection systems on FPGAs, in Proc. 14th Int. Conf. Field Program. Logic Appl., 2004, pp. 311321. [8] G. Papadopoulos and D. Pnevmatikatos, Hashing + Memory = Low Cost, exact pattern matching, in Proc. Int. Conf. Field Program. Logic Appl., 2005, pp. 3944. [9] Xilinx, San Jose, CA, VirtexE, Virtex2, Virtex2Pro, and Spartan3 datasheets, 2006. [Online]. Available: http://www.xilinx.com [10] F. J. Burkowski, A hardware hashing scheme in the design of a multiterm string comparator, IEEE Trans. Comput., vol. 31, no. 9, Sep.1982, pp.825834.

Multi-Gigabit Pattern Matching For Packet Assesment in Network Security

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi-Gigabit Pattern Matching For Packet Assesment in Network Security

Uploaded by

Copyright:

Available Formats

IS J AA

International Journal of Systems , Algorithms & Applications