Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Cybersecurity and Applied Mathematics
Cybersecurity and Applied Mathematics
Cybersecurity and Applied Mathematics
Ebook395 pages4 hours

Cybersecurity and Applied Mathematics

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Cybersecurity and Applied Mathematics explores the mathematical concepts necessary for effective cybersecurity research and practice, taking an applied approach for practitioners and students entering the field. This book covers methods of statistical exploratory data analysis and visualization as a type of model for driving decisions, also discussing key topics, such as graph theory, topological complexes, and persistent homology.

Defending the Internet is a complex effort, but applying the right techniques from mathematics can make this task more manageable. This book is essential reading for creating useful and replicable methods for analyzing data.

  • Describes mathematical tools for solving cybersecurity problems, enabling analysts to pick the most optimal tool for the task at hand
  • Contains numerous cybersecurity examples and exercises using real world data
  • Written by mathematicians and statisticians with hands-on practitioner experience
LanguageEnglish
Release dateJun 7, 2016
ISBN9780128044995
Cybersecurity and Applied Mathematics
Author

Leigh Metcalf

Leigh Metcalf research’s network security, game theory, formal languages, and dynamical systems. She is Editor in Chief of the Journal on Digital Threats and has a PhD in Mathematics.

Related to Cybersecurity and Applied Mathematics

Related ebooks

Security For You

View More

Related articles

Reviews for Cybersecurity and Applied Mathematics

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Cybersecurity and Applied Mathematics - Leigh Metcalf

    (ACM).

    Chapter 1

    Introduction

    Abstract

    In this chapter we discuss the purpose of the book and the mathematical underpinnings of it.

    Keywords

    Introduction; Models; Mathematical techniques

    The practice of cybersecurity involves diverse data sets, including DNS, malware samples, routing data, network traffic, user interaction, and more. There is no one size fits all analysis scheme for this data, a new method must be created for each data set. The best methods have a mathematical basis to them.

    A mathematical model of a system is an abstract description of that system that uses mathematical concepts. We want to take the systems in cybersecurity and create mathematical models of them in order to analyze the systems, make predictions of the system, derive information about the system, or other goals, depending on the system. This book is designed to give you the basic understanding of the mathematical concepts that are most relevant to designing models for cybersecurity models.

    Cybersecurity is often about finding the needle in the needlestack. Finding that one bit that looks almost, but not quite like, everything else. In a network that can generate gigabytes of traffic a day, discovering that small amount of anomalous traffic that is associated with malware is a difficult proposition. Similarly, finding the one set of maliciously registered domains in the hundreds of million of domain names is not an easy process.

    There are a wide variety of mathematical techniques that can be used to create methods to analyze cybersecurity data. These techniques are the underpinnings that essentially make it work. Statistics cares about the origin of the data, how it was collected, and what assumptions you can make about the data. Mathematical techniques, such as graph theory, are developed on the structure known as a graph, and work no matter what they are used to model. That is the beauty of math.

    The point of this book is not to spend time going through proofs as to why the various mathematical techniques work, but rather to give an introduction into the areas themselves. Careful consideration was taken in the chapters to include the description of what things are and how they work, but to not overwhelm the reader with the why. The why is not always relevant to understanding the what or how. This book is designed for the cybersecurity analyst who wishes to create new techniques that have a secure foundation in math.

    The content is designed to cover various areas that are used in cybersecurity today, to give the reader a firm basis in understanding how they can be applied in creating new analysis methods as well as to enable the reader to achieve greater understanding of current methods. The reader is expected to have studied calculus in order to understand the concepts in the book.

    Chapter 2

    Metrics, similarity, and sets

    Abstract

    In this chapter we cover an introduction to set theory, with common operations such as subset, intersection, union, set difference, complement and symmetric difference with examples from cybersecurity data. Set functions are also discussed, which leads us directly to the definition of a metric. We cover the variations of metric, including pseudometric, quasimetric, and semimetric. Similarities are also discussed. We then illustrate the metric on various sets, including strings, sets, Internet and cybersecurity specific metrics.

    Keywords

    Sets; Set operations; Functions; Metric; Similarity

    The human eye can discern differences between two objects, but cannot necessarily quantify that difference. For example, a red apple and a green apple are obviously different, but still similar in that they are both apples. If we consider a red apple and a computer, they are obviously completely different. We can only say similar but different or obviously different.

    We need to quantify this difference in a reasonable way. To this end, we create a framework that standardizes the properties that a distance should satisfy. Before we cover the distance, we begin with the basics of set theory. Distances are inherently defined on a set, so the knowledge of some basics of set theory is useful. The chapter concludes with relevant examples of distances.

    2.1 Introduction to Set Theory

    We have lots of words for collections of things, a deck of cards, a flock of birds, a pod of whales, a murder of ravens, or a fleet of automobiles. The underlying theme of these words is collection of things that have a property in common; or as we say it in math, a set of things such as a set of birds or a set of domains. A set of domains would not contain a URL, because then we could not call it a set of domains. On the other hand, a set of domains could contain 127.0.0.1 because an IP address can act like a domain. The important notion within set theory is the idea that we can regard any collection of objects as a single thing.

    We tend to use capital letters to represent sets, like A or D. In that vein, let A represent a set of Autonomous Systems represented by A = {ASN65512,ASN65524,ASN65517,ASN65530,ASN65527}. Another way to represent a set is called a Venn diagram. If A is a set, then the Venn diagram for A is in Fig. 2.1. The set is shaded because we are interested in every element within the set.

    Fig. 2.1 A single set.

    In mathematical terms, the things that make up a set are called objects or members. If an object a is contained within a set denoted by D, we write a D. If the object is not contained within D, we write a D. Returning our set A, we know ASN65512 ∈ A and ASN65518 ∉ A.

    An important set is the empty set. This is the set that does not have any members at all. We represent this set as or {}. If we say that B = then we are saying that the set denoted by B contains nothing.

    The number of elements in a set is called the size. The size of a set A is written as |A|. For the empty set B, |B| = 0 and for the set of Autonomous Systems A, |A| = 5.

    If every element of a set is contained in another set, then the first set is a subset of the second set and the second set is called a superset . Conversely, if B is a superset of A, that is, B contains A.

    In order to show that two set A and B . In other words, we show that every element of A is an element of B and every element of B is also an element of A. So we cannot have an element of A that is not also in B and vice versa.

    We can also show that size of the subset is less than or equal to the size of the superset. If X is a subset of Y, then |X|≤|Y |. It can’t be more than |Y |, because every element of X is contained in Y so it doesn’t have additional elements.

    . A single element of a set is can be a subset of a set, since there is no rule that a set must contain multiple elements. So if . And since the empty set contains nothing, then the empty set is contained in every set as a subset, or in other words, something contains nothing.

    Example 2.1.1

    A set can be considered a single object. Remember that a set is a way of considering a group of objects as a single thing. So there’s no rule against having a set that contains other sets, or a set of sets. The power set of a set is the set of all possible subsets of that set, including the empty set and the set itself. If S is the power set .

    As an example, consider the set S = {a,b,c}. Then the power set of S is the set

    . And we can see that that set has eight elements.

    2.2 Operations on Sets

    If we start with a set, then we should have some operations defined on the sets in order to create new ones. We’ll begin with elementary operations and then develop more advanced operations. The more advanced operations can be derived from the elementary operations.

    2.2.1 Complement

    Suppose we have a set P and a subset Q of P. We know that every element of Q is an element of P but every element of P is not necessarily an element of Q. The elements of P that are not elements of Q are the complement of Q in P. It is written as Qc. So if we have a set S of all systems in a company and a subset V of those systems that were infected by a virus, then Vc is the set of all domains that were not found in malicious software.

    2.2.2 Intersection

    If we have two sets that have properties in common, we can find out which elements are in both sets and create a new set. This is the intersection of the sets. It does not make sense to find the intersection of a set of user names and a set of domains, because the two sets share no property in common. On the other hand, we can take the set of domains that our users visited and intersect it with a set of domains known to be used by malicious software. The new set is the set of domains that our users visited that are associated with malicious software.

    symbol. So when we intersect the sets A and Bis every element in A and Bmust be in Ais also in B. The Venn diagram in Fig. 2.2 is an illustration of the intersection. The shaded area between the two sets is the intersection, it represents the elements that are in both set A and set B.

    Fig. 2.2 .

    Example 2.2.1

    Suppose we have two sets of IP addresses. The first set will contain four IP addresses, A = {127.0.0.1,192.168.5.5,10.0.0.1,10.5.65.129} and the second will be B .

    .

    . Every element that is in A and B is also in B and A, so they are equivalent. This is the mathematical concept of commutativity.

    It is possible that the intersection of two sets is the empty set. If one set is all IP addresses external to the organization and another set is all IP addresses internal to the organization, then the intersection of those two sets is empty. When this occurs, the sets are called disjoint. If A is a subset of B, then Ac is disjoint from A. An element can either be in A or not in A, but not both.

    There is no rule that we can only intersect two sets at a time. If we have a set for each day of the week or each day of the year, we can intersect them. In fact, we can intersect any number of sets.

    2.2.3 Union

    We defined an intersection of two sets A and B as the elements that are in both A and B. The union is the combination of all of the elements in A or B. If we have a set of domain names and a set of malware samples, the union of the two becomes a set of domain names and malware samples. It really does not make sense to create this set though, because a set of domains means that the entire set contains domain names. A set of domain names and malware samples does not actually specify completely what each element of the set is. Rather, it is a qualifier of what the set could contain.

    and the Venn diagram that illustrates it is in Fig. 2.3. You can see that both sets are shaded in, demonstrating that every element in both A or B is included in the union. The union, as in the case of the intersection, is commutative and has no restriction on the number of sets that can be combined.

    Fig. 2.3 .

    If we have two sets A and B. This is because every element that is in A . This is also illustrated in the Venn diagram in Fig. 2.3.

    2.2.4 Difference

    Another combination of sets considers the elements of one set that are not in the other set. If A = {ASN64569,ASN64794,ASN65054,ASN65193,ASN65200} and B = {ASN64569,ASN64575,ASN65058,ASN65200,ASN65350}, then the elements that are in A but not in B are {ASN64794,ASN65054,ASN65193}. On the other hand, the elements that are in B but not in A are {ASN64575,ASN65058,ASN65350}. This action is called the difference between the two sets and it clearly matters the order in which this action is performed. The difference between the two sets A and B is written as A B, while the difference between the sets B and A is written as B A. The Venn diagram for A B is in Fig. 2.4.

    Fig. 2.4 A B .

    We also know that if we are looking for A B that this is the equivalent of finding the complement of B and intersecting that with A. The difference was described as elements of A that are not in B. The complement was described as not in, so that would mean we are looking at Bc.

    2.2.5 Symmetric Difference

    A variant on the difference is called the symmetric difference. It is comprised of the elements of two sets A and B that are in A or B but not in A and B and denoted by AB. The Venn diagram in Fig. 2.5.

    Fig. 2.5 A B .

    It’s clear from the Venn diagram that AB = BA, so unlike difference, the symmetric difference is commutative. The definition is elements that are in A or B. The definition then says and not A and B. So the symmetric difference is as defined in Eq. (2.1):

       (2.1)

    2.2.6 Cross Product

    The intersection, union, subset, difference and symmetric difference of two sets creates a new set that may or may not contain all of the elements of the two sets. If we want to keep all of the elements of both sets, we can create a new set called the cross product or Cartesian product of them. Let A and B be two sets. This new set has elements that look like (a,b) where a A and b B, so it creates pairs of every element of A combined with every element of B. The notation for this set is A × B. We know that each element of A there are |B| elements that are in A × B, so |A × B| = |A||B|.

    If we’re considering our set A and taking the cross product with itself, we can write A × A or Awith itself n . Elements of this set are generally written as (r1,r2,…,rnis the standard (x,yis three- dimensional

    Enjoying the preview?
    Page 1 of 1