You are on page 1of 117

Splunk Knowledge Manager Manual

Version: 4.1.7

Generated: 2/16/2011 03:57 pm


Copyright Splunk, Inc. All Rights Reserved
Table of Contents
Welcome to knowledge management................................................................................................1
What is Splunk knowledge?.......................................................................................................1
Why manage Splunk knowledge?..............................................................................................2
Prerequisites for knowledge management.................................................................................4

Organize and administrate knowledge objects.................................................................................6


Curate Splunk knowledge with Manager....................................................................................6
Develop naming conventions for knowledge objects...............................................................13
Understand and use the Common Information Model..............................................................14

Data interpretation: Fields and field extractions.............................................................................25


About fields..............................................................................................................................25
Overview of search-time field extraction..................................................................................26
Use the Field extractions page in Manager..............................................................................29
Use the Field transformations page in Manager......................................................................34
Create and maintain search-time field extractions through configuration files.........................39
Configure multivalue fields.......................................................................................................52

Data classification: Event types and transactions.........................................................................54


About event types....................................................................................................................54
Define and maintain event types in Splunk Web......................................................................56
Configure event types directly in eventtypes.conf....................................................................60
Configure event type templates...............................................................................................61
About transactions...................................................................................................................62
Search for transactions............................................................................................................63
Define transactions..................................................................................................................65

Data enrichment: Lookups and workflow actions..........................................................................68


About lookups and workflow actions........................................................................................68
Look up fields from external data sources................................................................................69
Create workflow actions in Splunk Web...................................................................................77
Configure workflow actions through workflow_actions.conf.....................................................85

Data normalization: Tags and aliases..............................................................................................86


About tags and aliases.............................................................................................................86
Define and manage tags..........................................................................................................86
Create aliases for fields............................................................................................................90
Tag the host field......................................................................................................................92
Tag event types........................................................................................................................93

Manage your search knowledge.......................................................................................................94


Manage saved searches..........................................................................................................94
Configure the priority of scheduled searches...........................................................................94
Design macro searches...........................................................................................................96
Design form searches..............................................................................................................96

i
Table of Contents
Manage your search knowledge
Define navigation to saved searches and reports....................................................................97

Set up and use summary indexes....................................................................................................99


Use summary indexing for increased reporting efficiency........................................................99
Manage summary index gaps and overlaps...........................................................................105
Configure summary indexes...................................................................................................109

ii
Welcome to knowledge management
What is Splunk knowledge?
What is Splunk knowledge?

Splunk is a powerful search and analysis engine that helps you see both the details and the larger
patterns in your IT data. When you use Splunk you do more than just look at individual entries in your
log files; you leverage the information they hold collectively to find out more about your IT
environment.

Splunk automatically extracts different kinds of knowledge from your IT data--events, fields,
timestamps, and so on--to help you harness that information in a better, smarter, more focused way.
Some of this information is extracted at index time, as Splunk indexes your IT data. But the bulk of
this information is created at "search time," both by Splunk and its users. Unlike databases or
schema-based analytical tools that decide what information to pull out or analyze beforehand, Splunk
enables you to dynamically extract knowledge from raw data as you need it.

As your organization uses Splunk, additional categories of Splunk knowledge objects are created,
including event types, tags, lookups, field extractions, workflow actions, and saved searches.

You can think of Splunk knowledge as a multitool that you use to discover and analyze various
aspects of your IT data. For example, event types enable you to quickly and easily classify and group
together similar events; you can then use them to perform analytical searches on precisely-defined
subgroups of events.

If you've read the User manual, you know that it covers Splunk knowledge basics in its "Capture
knowledge" chapter. The Knowledge Manager manual goes into more depth. It shows you how to
maintain sets of knowledge objects for your organization (through Manager and configuration files)
and demonstrates ways that Splunk knowledge can be used to solve your organization's real-world
problems.

Splunk knowledge is grouped into five categories:

• Data interpretation: Fields and field extractions - Fields and field extractions make up the
first order of Splunk knowledge. The fields that Splunk automatically extracts from your IT data
help bring meaning to your raw data, clarifying what can at first glance seem
incomprehensible. The fields that you extract manually expand and improve upon this layer of
meaning.
• Data classification: Event types and transactions - You use event types and transactions
to group together interesting sets of similar events. Event types group together sets of events
discovered through searches, while transactions are collections of conceptually-related events
that span time.
• Data enrichment: Lookups and workflow actions - Lookups and workflow actions are
categories of knowledge objects that extend the usefulness of your data in various ways. Field
lookups enable you to add fields to your data from external data sources such as static tables
(CSV files) or Python-based commands. Workflow actions enable interactions between fields
in your data and other applications or web resources, such as a WHOIS lookup on a field

1
containing an IP address.
• Data normalization: Tags and aliases - Tags and aliases are used to manage and normalize
sets of field information. You can use tags and aliases to group sets of related field values
together, and to give extracted fields tags that reflect different aspects of their identity. For
example, you can group events from set of hosts in a particular location (such as a building or
city) together--just give each host the same tag. Or maybe you have two different sources
using different field names to refer to same data--you can normalize your data by using aliases
(by aliasing clientip to ipaddress, for example).
• Saved searches - Saved searches are another category of Splunk knowledge. Vast numbers
of saved searches can be created by Splunk users within an organization, and thoughtful
saved search organization ensures that they are discoverable by those that need them. There
are also advanced uses for saved searches: they are often used in dashboards, can be turned
into reusable search macros, and more.

The Knowledge Manager manual also includes a chapter on summary indexing. Summary index
setup and oversight is an advanced practice that can benefit from being handled by users in a
knowledge management role.

At this point you may be asking the question "Why does Splunk knowledge need to be 'managed'
anyway?" For answers, see "Why manage Splunk knowledge?", the next topic in this chapter.

Knowledge managers should have at least a basic understanding of data input setup, event
processing, and indexing concepts. For more information, see Prerequisites for knowledge
management, the third topic in this chapter.

Make a PDF

If you'd like a PDF of any version of this manual, click the pdf version link above the table of
contents bar on the left side of this page. A PDF version of the manual is generated on the fly for you,
and you can save it or print it out to read later.

Why manage Splunk knowledge?


Why manage Splunk knowledge?

If you have to maintain a fairly large number of knowledge objects across your Splunk deployment,
you know that management of that knowledge is important. This is especially true of organizations
that have a large number of Splunk users, and even more so if you have several teams of users
working with Splunk. This is simply because a greater proliferation of users leads to a greater
proliferation of additional Splunk knowledge.

When you leave a situation like this unchecked, your users may find themselves sorting through large
sets of objects with misleading or conflicting names, struggling to find and use objects that have
unevenly applied app assignments and permissions, and wasting precious time creating objects such
as saved searches and field extractions that already exist elsewhere in the system.

Splunk managers provide centralized oversight of the Splunk knowledge. The benefits that
knowledge managers can provide include:

2
• Oversight of knowledge object creation and usage across teams, departments, and
deployments. If you have a large Splunk deployment spread across several teams of users,
you'll eventually find teams "reinventing the wheel" by designing objects that were already
developed by other teams. Knowledge managers can mitigate these situations by monitoring
object creation and ensuring that useful "general purpose" objects are shared on a global
basis across deployments.

For more information, see "Curate Splunk knowledge with Manager" in this manual.

• Normalization of event data. To put it plainly: knowledge objects proliferate. Although Splunk
is based on data indexes, not databases, the basic principles of normalization still apply. It's
easy for any robust, well-used Splunk implementation to end up with a dozen tags that all have
been to the same field, but as these redundant knowledge objects stack up, the end result is
confusion and inefficiency on the part of its users. We'll provide you with some tips about
normalizing your knowledge object libraries by applying uniform naming standards and using
Splunk's Common Information Model.

For more information, see "Develop naming conventions for knowledge objects" in this
manual.

• Management of knowledge objects through configuration files. True knowledge


management experts know how and when to leverage the power of Splunk's configuration files
when it comes to the administration of Splunk knowledge. There are certain aspects of
knowledge object setup that are best handled through configuration files. This manual will
show you how to work with knowledge objects this way.

See "Create search time field extractions" in this manual as an example of how you can
manage Splunk knowledge through configuration files.

• Setup and organization of app-level navigation for saved searches and reports, as well
as views and dashboards. Left unmoderated, the navigation for saved searches, reports,
views, and dashboards can become very confusing as more and more of these kinds of
objects are added to Splunk applications. You don't have to be a Splunk app designer to
ensure that users can quickly and easily navigate to the searches, reports, views, and
dashboards they need to do their job efficiently.

For more information, see "Define navigation for saved searches and reports" in this manual.

• Review of summary index setup and usage. Summary indexes may be used by many
teams across your deployment to run efficient searches on large volumes of data, but their
usage also counts against your overall license volume. The knowledge manager can provide
centralized oversight of summary index usage across your organization, ensuring that they are
built correctly, used responsibly, and are shared as appropriate with users throughout your
Splunk deployment.

Note: As of Release 4.1, summary index usage does not count against your overall license
volume.

3
For more information, see "Use summary indexing for increased reporting efficiency" in this
manual.

Prerequisites for knowledge management


Prerequisites for knowledge management

Most knowledge management tasks are centered around "search time" event manipulation. In other
words, a typical knowledge manager usually doesn't focus their attention on work that takes place
before events are indexed, such as setting up data inputs, adjusting event processing activities,
correcting default field extraction issues, creating and maintaining indexes, setting up forwarding and
receiving, and so on.

However, we do recommend that all knowledge managers have a good understanding of these
"Splunk admin" concepts. A solid grounding in these subjects enables knowledge managers to better
plan out their approach towards management of knowledge objects for their deployment...and it helps
them troubleshoot issues that will inevitably come up over time.

Here are some of the "admin" topics that knowledge managers should be familiar with, with Admin
manual links to get you started:

• Working with Splunk apps: If your deployment uses more than one Splunk app, you should
get some background on how they're organized and how app object management works within
multi-app deployments. See "What's an app?", "App architecture and object ownership", and
"Manage app objects".

• Configuration file management: Where are Splunk's configuration files? How are they
organized? How do configuration files take precedence over each other? See "About
configuration files" and "Configuration file precedence".

• Indexing with Splunk: What is an index and how does it work? What is the difference
between "index time" and "search time" and why is this distinction significant? Start with
"What's a Splunk index?" and read the rest of the chapter. Pay special attention to "Index time
vs search time".

• Getting event data into Splunk: It's important to have at least a baseline understanding of
Splunk data inputs. Check out "What Splunk can monitor" and read the other topics in this
chapter as necessary.

• Understand your forwarding and receiving setup: If your Splunk deployment utilizes
forwarders and receivers, it's a good idea to get a handle on how they've been implemented,
as this can affect your knowledge management strategy. Get an overview of the subject at
"About forwarding and receiving".

• Understand event processing: It's a good idea to get a good grounding in the steps that
Splunk goes through to "parse" data before it indexes it. This knowledge can help you
troubleshoot problems with your event data and recognize "index time" event processing
issues. Start with "Overview of event processing" and read the entire chapter.

4
• Default field extraction: Most field extraction takes place at search time, with the exception of
certain default fields, which get extracted at index-time. As a knowledge manager, most of the
time you'll concern yourself with search-time field extraction, but it's a good idea to know how
default field extraction can be managed when it's absolutely necessary to do so. This can help
you troubleshoot issues with the host, source, and sourcetype fields that Splunk applies
to each event. Start with "About default fields".

• Managing users and roles: Knowledge managers typically do not directly set up users and
roles. However, it's a good idea to understand how they're set up within your deployment, as
this directly affects your efforts to share and promote knowledge objects between groups of
users. For more information, start with "About users and roles" and read the rest of the chapter
as necessary.

5
Organize and administrate knowledge objects
Curate Splunk knowledge with Manager
Curate Splunk knowledge with Manager

As your organization uses Splunk, knowledge is added to the base set of event data indexed within
it. Searches are saved and scheduled. Tags are added to fields. Event types and transactions that
group together sets of events are defined. Lookups and workflow actions are engineered.

The process of knowledge object creation starts out slow, but can get complicated over time. It's
easy to reach a point where users are "reinventing the wheel," creating searches that already exist,
designing redundant event types, and so on. These things may not be a big issue if your user base is
small, but they can cause unnecessary confusion and repetition of effort, especially as they
accumulate over time.

This topic discusses how knowledge managers can use Splunk Manager to take charge of the
knowledge objects in their Splunk system and show them who's boss. Splunk Manager can give a
savvy and attentive knowledge manager a view into what knowledge objects are being created, who
they're being created by, and (to some degree) how they are being used.

With Manager, you can easily:

• Create knowledge objects as necessary, either "from scratch" or through object cloning.
• Review knowledge objects as they are created, with an eye towards reducing redundancy,
ensuring that naming standards are followed, and that "bad" objects are removed before they
develop lots of downstream dependencies.
• Ensure that knowledge objects with relevancy beyond a particular working team, role, or app
are made available to other teams, roles, and users of other apps.
• Delete knowledge objects that do not have significant "downstream" dependencies.

Note: This topic assumes that as a knowledge manager you have an admin role or a role with an
equivalent permission set.

Using configuration files instead of Manager

In previous releases Splunk users edited Splunk's configuration files directly to add, update, or delete
knowledge objects. Now they can use Manager, which provides a user-friendly interface with those
very same configuration files.

We do recommend having some familiarity with configuration files. The reasons for this include:

• Some Manager functionality makes more sense if you understand how things work at the
configuration file level. This is especially true for the Field extractions and Field
transformations pages in Manager.
• Functionality exists for certain knowledge object types that isn't (or isn't yet) expressed in the
Manager UI.

6
• Bulk deletion of obsolete, redundant, or improperly defined knowledge objects is only possible
with configuration files.
• You may find that you prefer to work directly with configuration files. For example, if you're a
long-time Splunk user, brought up on our configuration file system, it may be the medium in
which you've grown accustomed to dealing with knowledge objects. Other users just prefer the
level of granularity and control that configuration files can provide.

Wherever you stand with Splunk's configuration files, we want to make sure you can use them when
you find it necessary to do so. To that end, you'll find that the Knowledge Manager manual includes
instructions for handling various knowledge object types via configuration files. For more information,
see the documentation of those types.

For general information about configuration files in Splunk, see the following topics in the Admin
manual:

• About configuration files


• Configuration file precedence

You can find examples of the current configuration .spec and .example files in the "Configuration
file reference" chapter of the Admin manual.

Monitor and organize knowledge objects

As a knowledge manager, you should periodically check up on the knowledge object collections in
your Splunk implementation. You should be on the lookout for knowledge objects that:

• Fail to adhere to naming standards


• Are duplicates/redundant
• Are worthy of being shared with wider audiences
• Should be disabled or deleted due to obsolescence or poor design

Regular inspection of the knowledge objects in your system will help you detect anomalies that could
become problems later on.

Example - Keeping tags straight

Most healthy Splunk implementations end up with a lot of tags, which are used to perform searches
on clusters of field/value pairings. Over time, however, it's easy to end up with tags that have similar
names but which produce surprisingly dissimilar results. This can lead to considerable confusion and
frustration.

Here's a procedure you can follow for curating tags. It can easily be adapted for other types of
knowledge objects handled through Manager.

1. Go to Manager > Tags > List by tag name.

2. Look for tags with similar or duplicate names that belong to the same app (or which have been
promoted to global availability for all users). For example, you might find a set of tags like

7
authentication and authentications in the same app, where one tag is linked to an entirely
different set of field/value pairs than the other.

Alternatively, you may encounter tags with identical names except for the use of capital letters, as in
crash and Crash. Tags are case-sensitive, so Splunk sees them as two separate knowledge
objects.

Keep in mind that you may find legitimate tag duplications if you have the App context set to All,
where tags belonging to different apps have the same name. This is often permissible--after all, an
authentication tag for the Windows app will have to be associated with an entirely different set of
field/value pairs than an authentication for the UNIX app, for example.

3. Try to disable or delete the duplicate or obsolete tags you find, if your permissions enable you to
do so. However, be aware that there may be objects dependent on it that will be affected. If the
tag is used in saved searches, dashboard searches, other event types, or transactions, those objects
will cease to function once the tag is removed or disabled. This can also happen if the object belongs
to one app context, and you attempt to move it to another app context.

For more information, see "Disable or delete knowledge objects," below.

4. If you create a replacement tag with a new, more unique name, ensure that it is connected to the
same field/value pairs as the tag that you are replacing.

Using naming conventions to head off object nomenclature issues

If you set up naming conventions for your knowledge objects early in your implementation of Splunk
you can avoid some of the thornier object naming issues. For more information, see "Develop naming
conventions for knowledge objects" in this manual.

Share and promote knowledge objects

As a Knowledge Manager, you can set knowledge object permissions to restrict or expand access to
the variety of knowledge objects within your Splunk implementation.

In some cases you'll determine that certain specialized knowledge objects should only be used by
people in a particular role, within a specific app. And in others you'll move to the other side of the
scale and make universally useful knowledge objects globally available to all users in all apps. As
with all aspects of knowledge management you'll want to carefully consider the implications of these
access restrictions and expansions.

When a Splunk user first creates a new saved search, event type, transaction, or similar knowledge
object, it is only available to that user. To make that object available to more people, Manager
provides the following options, which you can take advantage of if your permissions enable you to do
so. You can:

• Make the knowledge object available globally to users of all apps (also referred to as
"promoting" an object).
• Make the knowledge object available to all users of an app.

8
• Restrict (or expand) access to global or app-specific objects by user or role.
• Set read/write permissions at the app level for roles, to enable users to share or delete objects
they do not own.

How do permissions affect knowledge object usage?

To illustrate how these choices can affect usage of a knowledge object, imagine that Bob, a user of
the (fictional) Network Security app with a "Firewall Manager" role, creates a new event type named
firewallbreach, which finds events that indicate firewall breaches. Here's a series of
permissions-related issues that could come up, and the actions and results that would follow:

Issue Action Result


When Bob first creates Bob updates the permissions Anyone using Splunk in the Network
firewallbreach, it is only of the firewallbreach Security app context can see, work
available to him. Other event type so that it is with, and edit the firewallbreach
users cannot see it or work available to all users of the event type. Users of other Splunk
with it. Bob decides he Network Security app, apps in the same Splunk
wants to share it with his regardless of role. He also implementation have no idea it exists.
fellow Network Security app sets up the new event type so
users. that all Network Security users
can edit its definition.
A bit later on, Mary, the Mary restricts the ability to edit Users of the Network Security app can
knowledge manager, the event type to the Firewall use the firewallbreach event type
realizes that only users in Manager role. in transactions, searches,
the Firewall Manager role dashboards, and so on, but now the
should have the ability to only people that can edit the
edit or update the knowledge object are those with the
firewallbreach event Firewall Manager role and people with
type. admin level permissions (such as the
knowledge manager). People using
Splunk in other app contexts remain
blissfully ignorant of the event type.
At some point a few people They make their case to the Now, everyone that uses this
who have grown used to knowledge manager, who implementation of Splunk can use the
using the very handy promptly promotes the firewallbreach event type, no
firewallbreach event firewallbreach event type matter what app context they happen
type in the Network Security to global availability. to be in. But the ability to update the
app decide they'd like to use event type definition is still confined to
it in the context of the admin-level users and users with the
Windows app as well. Firewall Manager role.
Note: You may want to set your Splunk implementation up so that only people with Admin-level roles
can share and promote knowledge objects. This would make you (and your fellow knowledge
managers) gatekeepers with approval capability over the sharing of new knowledge objects.

Permissions - Getting started

To change the permissions for a knowledge object, follow these steps:

9
1. In Manager, navigate to the page for the type of knowledge object that you want to update
permissions for (such as Searches and reports or Event types).

2. Find the knowledge object that you created (use the filtering fields at the top of the page if
necessary) and click its Permissions link.

3. On the Permissions page for the knowledge object in question, perform the actions in the following
subsections depending on how you'd like to change the object's permissions.

Make an object available to users of all apps

To make an object globally available to users of all apps in your Splunk implementation:

1. Navigate to the Permissions page for the knowledge object (following the instructions above).

2. Under [Knowledge object type] should appear in:, select All apps.

3. In the Permissions section, for Everyone, select a permission of either Read or Write:

• Read enables users to see and use the object, but not update its definition. In other words,
when users only have Read permission for a particular saved search, they can see it in the top
level navigation (the "Searches & Reports" dropdown, for example) and they can run it. But
they can't update the the search string, change its time range, and save their changes.
• Write enables users to view, use, and update the defining details of an object as necessary.
• If neither Read or Write is selected then users cannot see or use the knowledge object.

4. Save the permission change.

Make an object available to users of a particular app

To restrict the usage of a knowledge object to a specific app, you first have to be in the context of that
app. To do this, click the App dropdown in the upper right-hand corner of the screen and select the
app to which you'd like to restrict the knowledge object.

• If the knowledge object is private, or shared globally, then all you have to do is navigate to
the Permissions page for that object and select This app under [Knowledge object type]
should appear in:. Then select a permission of either Read or Write for Everyone as
appropriate.
• If usage of a knowledge object is already restricted to an app and you want to switch its
context to another app, click the Move link (it will only appear if you have sufficient
permissions to move it). This will enable you to quickly and easily choose another app context
for the knowledge object.

Keep in mind, however, that switching the app context of an knowledge object can have
downstream consequences for objects that have been associated with it. For more information
see "Disable or delete knowledge objects", below.

10
Restrict knowledge object access by role

You can use this method to lock down various knowledge objects from alteration by specific roles.
You can arrange things so users in a particular role can use the knowledge object but not update
it--or you can set it up so those users cannot see the object at all. In the latter case, the object will not
show up for them in Manager, and they will not find any results when they search on it.

If you want restrict the ability to see or update a knowledge object by role, simply navigate to the
Permissions page for the object. If you want members of a role to:

• Be able to use the object and update its definition, give that role Read and Write access.
• Be able to use the object but be unable to update it, give that role Read access only (and
make sure that Write is unchecked for the Everyone role).
• Be unable to see or use the knowledge object at all, leave Read and Write unchecked for
that role (and unchecked for the Everyone role as well).

For more information about role-based permissions in Splunk see "About users and roles" in the
Admin manual.

A note about deleting users and roles with unshared objects

If a Splunk user leaves your team and you need to delete that user or role from the Splunk system,
be aware that you will lose any knowledge objects belonging to them that have a sharing status of
private. If you want to keep those knowledge objects, share them at the app or global level before
deleting the user or role.

Disable or delete knowledge objects

Let's start off by saying that Manager makes it fairly easy to disable or delete knowledge objects as
long as your permissions enable you to do so. In Splunk, the ability to delete knowledge objects in
Manager really depends on a set of factors:

• You cannot delete default knowledge objects that were delivered with Splunk (or with
the App) via Manager. If the knowledge object definition resides in the app's default directory,
it can't be removed via Manager. It can only be disabled (by clicking Disable). Only objects

11
that exist in an app's "local" directory are eligible for deletion.
• You can delete knowledge objects that you have created, and which haven't been
shared. Once a knowledge object you've created is shared with other users, your ability to
delete it is revoked, unless you have write permissions for the app to which they belong (see
the next point).
• To delete all other knowledge objects, you need to have write permissions for the
application to which they belong. This applies to knowledge objects that are shared globally
as well as those that are only shared within an app--all knowledge objects belong to a specific
app, no matter how they are shared.

App-level write permissions are usually only granted to users with admin-equivalent roles.

To sum up: the ability to edit a knowledge object has nothing to do with the ability to delete it. If you
can't delete a particular knowledge object you may still be able to disable it, which essentially has the
same function as knowledge object deletion without removing it from the system.

Deleting knowledge objects with downstream dependencies

You have to be careful about deleting knowledge objects with downstream dependencies, as this can
have negative impacts.

For example, you could have a tag that looks like the duplicate of another, far more common tag. On
the surface it would seem to be harmless to delete the dup tag. But what you may not realize is that
this duplicate tag also happens to be part of a search that a very popular event type is based upon.
And that popular event type is used in two important saved searches--the first is the basis for a
well-used dashboard panel, and the other is used to populate a summary index that is used by
searches that run several other dashboard panels. So if you delete that tag, the event type breaks,
and everything downstream of that event type breaks.

This is why it is important to nip poorly named or defined knowledge objects in the bud,
before they become inadvertently hard-wired into the workings of your deployment. The only
way to identify the downstream dependencies of a particular knowledge object is to search on it, find
out where it is used, and then search on those things to see where they are used--it can take a bit of
detective work. There is no "one click" way to bring up a list of knowledge object downstream
dependencies at this point.

If you really feel that you have to delete an knowledge object, and you're not sure if you've tracked
down and fixed all of its downstream dependencies, you could try disabling it first to see what impact
that has. If nothing seems to go seriously awry after a day or so, delete it.

Deleting knowledge objects in configuration files

Note that when you use manager, you can only disable or delete one knowledge object at a time. If
you need to remove large numbers of objects, the most efficient way to do it is by removing the
knowledge object stanzas directly through the configuration files. Keep in mind that several versions
of a particular configuration file can exist within your system. In most cases you should only edit the
configuration files in $SPLUNK_HOME/etc/system/local/, to make local changes on a site-wide
basis, or $SPLUNK_HOME/etc/apps/<App_name>/local/, if you need to make changes that
apply only to a specific app.

12
Do not try to edit configuration files until you have read and understood the following topics in the
Admin manual:

• About configuration files


• Configuration file precedence

Develop naming conventions for knowledge objects


Develop naming conventions for knowledge objects

We suggest you develop naming conventions for your knowledge objects when it makes sense to do
so. If the naming conventions you develop are followed consistently by all of the Splunk users in your
organization, you'll find that they become easier to use and that their purpose is much easier to
discern at a glance.

You can develop naming conventions for just about every kind of knowledge object in Splunk.
Naming conventions can help with object organization, but they can also help users differentiate
between groups of saved searches, event types, and tags that have similar uses. And they can help
identify a variety of things about the object that may not even be in the object definition, such as what
teams or locations use the object, what technology it involves, and what it's designed to do.

Early development of naming conventions for your Splunk implementation will help you avoid
confusion and chaos later on down the road.

Use the Common Information Model

Splunk's Common Information Model provides strategies for normalizing your approach to extracted
field names, event type tagging, and host tagging. It includes:

• A list of standard custom fields


• An event type tagging system
• Lists of standard host tags

For more information, see "Understand and use the Common Information Model" in this manual.

Example - Set up a naming convention for saved searches

You work in the systems engineering group of your company, and as the knowledge manager for
your Splunk implementation, it's up to you to come up with a naming convention for the saved
searches produced by your team.

In the end you develop a naming convention that pulls together:

• Group: Corresponds to the working group(s) of the user saving the search.
• Search type: Indicates the type of search (alert, report, summary-index-populating)
• Platform: Corresponds to the platform subjected to the search
• Category: Corresponds to the concern areas for the prevailing platforms.
• Time interval: The interval over which the search runs (or on which the search runs, if it is a
scheduled search).

13
• Description: A meaningful description of the context and intent of the search, limited to one or
two words if possible. Ensures the search name is unique.

Search Time
Group Platform Category Description
type interval
SEG Alert Windows Disk <arbitrary> <arbitrary>
NEG Report iSeries Exchange
OPS Summary Network SQL
NOC Event log
CPU
Jobs
Subsystems
Services
Security

Possible saved searches using this naming convention:

• SEG_Alert_Windows_Eventlog_15m_Failures
• SEG_Report_iSeries_Jobs_12hr_Failed_Batch
• NOC_Summary_Network_Security_24hr_Top_src_ip

Understand and use the Common Information Model


Understand and use the Common Information Model

The Common Information Model is based on the idea that you can break down most log files into
three components:

• fields
• event type tags
• host tags

With these three components a savvy knowledge manager should be able to set up their log files in a
way that makes them easily processable by Splunk and which normalizes noncompliant log files and
forces them to follow a similar schema. The Common Information model details the standard fields,
event type tags, and host tags that Splunk uses when it processes most IT data.

Normalizing the standard event format

This is the recommended format that should be used when events are generated or written to a
system:

<timestamp> name="<name>" event_id=<event_id> <key>=<value>

Any number of field key-value pairs are allowed. For example:

14
2008-11-06 22:29:04 name="Failed Login" event_id=sshd:failure
src_ip=10.2.3.4 src_port=12355 dest_ip=192.168.1.35 dest_port=22

The keys are ones that are listed in the "Standard fields below". name and event_id are mandatory.

When events coming from a CISCO PIX log are compliant with the Common Information Model
format, the following PIX event:

Sep 2 15:14:11 10.235.224.193 local4:warn|warning fw07 %PIX-4-106023:


Deny icmp src internet:213.208.19.33 dst
eservices-test-ses-public:193.8.50.70 (type 8, code 0) by
access-group "internet_access_in"

looks as follows:

2009-09-02 15:14:11 name="Deny icmp" event_id=106023 vendor=CISCO


product=PIX log_level=4 dvc_ip=10.235.224.193 dvs_host=fw07
syslog_facility=local4 syslog_priority=warn src_ip=213.208.19.33
dest_ip=193.8.50.70 src_network=internet
dest_network=eservices-test-ses-public icmp_type=8 icmp_code=0
proto=icmp rule_number="internet_access_in"

Standard fields

This table presents a list of standard fields that can be extracted from event data as custom
search-time field extractions.

Please note that we strongly recommend that all of these field extractions be performed at search
time. There is no need to add these fields to the set of default fields that Splunk extracts at index
time.

For more information about the index time/search time distinction, see "Index time versus search
time" in the Admin manual. For more information about performing field extractions at search time,
see "Create search-time field extractions" in this manual.

field name data type Explanation


The action specified by the event. For example, access,
action string
execution, or modification.
The user that was affected by a change. For example, user
affected_user string fflanda changed the name of user rhallen, rhallen is the
affected_user.
affected_user_group string The user group that is affected by a change.
affected_user_group_id string The identifier of the group affected by a change.
affected_user_id number The identifier of the user affected by a change.

15
affected_user_privileges enumeration The privileges of the user affected by a change.
ISO layer 7 (application layer) protocol--for example HTTP,
app string
HTTPS, SSH, IMAP.
bytes_in number How many bytes this device/interface received.
bytes_out number How many bytes this device/interface transmitted.
channel string 802.11 channel number used by a wireless network.
category string A device-specific classification provided as part of the event.
count number The number of times the record has been seen.
The Common Vulnerabilities and Exposures (CVE)
cve string
reference value.
desc string The free-form description of a particular event
dest_app string The name of the application being targeted.
dest_cnc_channel string The destination command and control service channel.
dest_cnc_name string The destination command and control service name.
dest_cnc_port number The destination command and control service port.
dest_country string The country associated with a packet's recipient.
dest_domain string The DNS domain that is being queried.
The fully qualified host name of a packet's recipient. For
dest_host string
HTTP sessions, this is the host header.
The interface that is listening remotely or receiving packets
dest_int string
locally.
dest_ip ipv4 address The IPv4 address of a packet's recipient.
dest_ipv6 ipv6 address The IPv6 address of a packet's recipient.
dest_lat number The (physical) latitude of a packet's destination.
dest_long number The (physical) longitude of a packet's destination.
mac The destination TCP/IP layer 2 Media Access Control
dest_mac
address (MAC) address of a packet's destination.
dest_nt_domain string The Windows NT domain containing a packet's destination.
dest_nt_host string The Windows NT host name of a packet's destination.
dest_port port The TCP/IP port to which a packet is being sent.
dest_record string The remote DNS resource record being acted upon.
dest_translated_ip ipv4 address The NATed IP address to which a packet is being sent.
dest_translated_port number The NATed port to which a packet is being sent.
The DNS zone that is being received by a slave as part of a
dest_zone string
zone transfer.
dhcp_pool string The name of a given DHCP pool on a DHCP server

16
The direction the packet is traveling, such as inbound or
direction string
outbound.
duration number The amount of time the event lasted.
The fully qualified domain name of the device transmitting or
dvc_host string
recording the log record.
dvc_ip ipv4 address The IPv4 address of the device reporting the event.
dvc_ip6 ipv6 address The IPv6 address of the device reporting the event.
dvc_location string The free-form description of the device's physical location.
MAC The MAC (layer 2) address of the device reporting the
dvc_mac
address event.
The Windows NT domain of the device recording or
dvc_nt_domain string
transmitting the event.
The Windows NT host name of the device recording or
dvc_nt_host string
transmitting the event.
dvc_time timestamp Time at which the device recorded the event.
end_time timestamp The event's specified end time.
A unique identifier that identifies the event. This is unique to
event_id number
the reporting device.
file_access_time timestamp The time the file (the object of the event) was accessed.
file_create_time timestamp The time the file (the object of the event) was created.
A cryptographic identifier assigned to the file object affected
file_hash string
by the event.
file_modify_time timestamp The time the file (the object of the event) was altered.
The name of the file that is the object of the event, with not
file_name string
information related to local file or directory structure.
The location of the file that is the object of the event, in
file_path string
terms of local file and directory structure.
Access controls associated with the file affected by the
file_permission string
event.
The size of the file that is the object of the event. Indicate
file_size number
whether Bytes, KB, MB, GB.
http_content_type string The HTTP content type.
http_method string The HTTP method used in the event.
http_referrer string The HTTP referrer listed in the event.
http_response number The HTTP response code.
http_user_agent string The HTTP user agent.
ip_version number The numbered Internet Protocol version - 4 or 6.

17
length number The length of the datagram, event, message, or packet.
The log-level that was set on the device and recorded in the
log_level string
event.
The name of the event as reported by the device. The name
name string should not contain information that's already being parsed
into other fields from the event, such as IP addresses.
object_name string The object name (associated mainly with Windows).
object_type string The object type (associated mainly with Windows).
object_handle string The object handle (associated mainly with Windows).
The network interface through which a packet was
outbound_interface string
transmitted.
packets_in number How many packets this device/interface received.
packets_out number How many packets this device/interface transmitted.
An integer assigned by the device operating system to the
pid number
process creating the record.
An environment-specific assessment of the importance of
the event, based on elements such as event severity,
priority number
business function of the affected system, or other locally
defined variables.
The program that generated this record (such as a process
process string
name mentioned in the syslog header).
product string The product that generated the event.
product_version number The version of the product that generated the event.
The OSI layer 3 (network layer) protocol--for example IP,
proto string
ICMP, IPsec, ARP.
The root cause of the result - "connection refused",
reason string
"timeout", "crash", etc.
recipient string The person to whom an email message is sent.
The DNS resource record class - IN (internet - default), HS
record_class string
(Hesiod - historic), or CH (Chaos - historic)
The DNS resource record type - see Wikipedia article on
record_type string
DNS record types
result string The result of the action - succeeded/failed, allowed/denied.
rule_number string The firewall rule-number or ACL number.
sender string The person responsible for sending an email message.
The severity (or priority) of an event as reported by the
severity string
originating device.
signature string The SID, as well as the signature identifiers used by other
Intrusion Detection Systems; the Event Identifiers assigned

18
by Windows-based operating systems to event records; and
Cisco's message IDs.
src_country string The country from which the packet was sent.
src_domain string The DNS domain that is being remotely queried.
The fully qualified host name of the system that transmitted
src_host string
the packet. For Web logs, this is the http client.
The interface that is listening locally or sending packets
src_int string
remotely.
The IPv4 address of the packet's source. For Web logs, this
src_ip ipv4 address
is the http client.
src_ipv6 ipv6 address The IPv6 address of the packet's source.
src_lat number The (physical) latitude of the packet's source.
src_long number The (physical) longitude of the packet's source.
mac The Media Access Control (MAC) address from which a
src_mac
address packet was transmitted.
The Windows NT domain containing the machines that
src_nt_domain string
generated the event.
The Windows NT hostname of the system that generated
src_nt_host string
the event.
src_port port The network port from which a packet originated.
src_record string The local DNS resource record being acted upon.
The translated/NAT'ed IP address from which a packet is
src_translated_ip ip address
being sent.
The translated/NAT'ed network port from which a packet is
src_translated_port number
being sent.
The DNS zone that is being transferred by the master as
src_zone string
part of a zone transfer.
session_id string The session identifier. Multiple transactions build a session.
The 802.11 service set identifier (ssid) assigned to a
ssid string
wireless network.
start_time timestamp The event's specified start time.
subject string The email subject line.
syslog The application, process, or OS subsystem that generated
syslog_facility
facility the event.
syslog
syslog_priority The criticality of an event, as recorded by UNIX syslog.
priority
The TCP flag specified in the event. One or more of SYN,
tcp_flags enumeration
ACK, FIN, RST, URG, or PSH.

19
The hex bit that specifies TCP 'type of service' (see
tos hex
http://en.wikipedia.org/wiki/Type_of_Service).
transaction_id string The transaction identifier.
transport string The transport protocol, such as TCP, UDP.
ttl number The "Time To Live" of a packet or datagram.
A Web address (Uniform Record Locator, or URL) included
url string
in a record.
user string The login ID affected by the recorded event.
A user group that is the object of an event, expressed in
user_group string
human-readable terms.
The numeric identifier assigned to the user group event
user_group_id string
object.
System-assigned numeric identifier for the user affected by
user_id number
an event.
The security context associated with the object of an event:
user_privilege enumeration
one of administrator, user, or guest/anonymous.
User that is the subject of an event. The one executing the
user_subject string
action.
ID number of the user that is the subject of an event. The
user_subject_id number
one executing the action.
The security context associated with a recorded event: one
user_subject_privilege enumeration
of administrator, user, or guest/anonymous.
vendor string The vendor who made the product that generated the event.
The numeric identifier assigned to the virtual local area
vlan_id number
network specified in the record.
vlan_name string The name assigned to the VLAN in the event.
Standardize your event type tags

The Common Information Model suggests that you use a specific convention when tagging your
event types. This convention requires that you set up three categories of tags, and that you give each
event type in your system a single tag from each of these categories. The categories are object,
action, and status.

This arrangement enables precise event type classification. The object tag denotes what the event is
about. What object has been targeted? Is the event talking about a host, a resource, a file, or what?
The action tag explains what has been done to the object (create, delete, modify, and so on). And
the status tag provides the status of the action. Was it successful? Failed? Or was it simply an
attempt? In addition to these three standard tags, you can add other tags as well.

The three tags in discussion here are:

<objecttag> <actiontag> <statustag>

20
Some examples of using the standard tags are:

• For a firewall deny event type:

host communicate_firewall failure

• For a firewall accept event :

host communicate_firewall success

• For a successful database login:

database authentication_verify success

Object event type tags

Use one of these object tags in the first position as defined above.

Tag Explanation
application An application-level event.
application av An anti virus event.
application backdoor An event using an application backdoor.
application database A database event.
application database data An event related to database data.
application dosclient An event involving a DOS client.
application firewall An event involving an application firewall.
application im An instant message-related event.
application peertopeer A peer to peer-related event.
host A host-level event.
group A group-level event
resource An event involving system resources.
resource cpu An event involving the CPU.
resource file An event involving a file.
resources interface An event involving network interfaces.
resource memory An event involving memory.
resource registry An event involving the system registry.
os An OS-level event.
os process An event involving an OS-related process
os service An event involving an OS service.
user A user-level event

21
Action event type tags

Use one of these action tags in the second position as defined above.

Tag Explanation
access An event that accesses something.
access read An event that reads something.
access read copy An event that copies something.
access read copy archive An event that archives something.
access read decrypt An event that decrypts something.
access read download An event that downloads something.
access write An event that writes something.
authentication An event involving authentication.
authentication add An event adding authentication rules.
authentication delete An event deleting authentication rules.
authentication lock An event indicating an account lockout.
authentication modify An event modifying authentication rules.
authentication verify An event verifying identity.
authorization An event involving authorization.
authorization add Adding new priviliges.
authorization delete Deleting privileges.
authorization modify Changing privileges, e.g., chmod.
authorization verify Checking privileges for an operation.
check An event checking something.
check status An event checking something's status.
create An event that creates something.
communicate An event involving communication.
communicate connect An event involving making a connection.
communicate disconnect An event involving disconnecting.
communicate firewall An event passing through a firewall.
delete An event that deletes something.
execute An event that runs something.
execute restart An event that restarts something.
execute start An event that starts something.
execute stop An event that stops something.

22
modify An event that changes something.
modify attribute An event that changes an attribute.
modify attribute rename An event that renames something.
modify configuration An event that changes a configuration.
modify content A content-related event.
modify content append An event that appends new content onto existing content.
modify content clear An event that clears out content.
modify content insert An event that inserts content into existing content.
modify content merge An event that merges content.
substitute An event that replaces something.
Status event type tags

Use one of these status tags in the third position as defined above.

Tag Explanation
attempt An event marking an attempt at something.
deferred A deferred event.
failure A failed event.
inprogress An event marking something progress.
report A report of a status.
success A successful event.
Optional tags

For those who want to use standard additional tags when they apply, some suggestions are below.

Tag Explanation
attack An event marking an attack.
attack exploit An event marking the use of an exploit.
attack bruteforce An event marking a brute force attack.
attack dos An event marking a denial of service attack.
attack escalation An event indicating a privilege escalation attack.
infoleak An event indicating an information leak.
malware An event marking malware action.
malware dosclient An event marking malware utilizing a DOS client.
malware spyware An event marking spyware.
malware trojan An event marking a trojan.
malware virus An event marking a virus.

23
malware worm An event marking a worm.
recon An event marking recon probes.
suspicious An event indicating suspicious activity.
Standardize your host tags

As you may know, it can be problematic to rename hosts directly. Because hosts are identified before
event data is indexed, changes to host names are not applied to data that has already been indexed.
It's far easier to use tags to group together events from particular hosts.

You can use standardized tags to describe specific hosts and what they do. There are a variety of
approaches to host tagging, all of which can be used where appropriate. Some of these methods
include:

• What service(s) the host is running.


• What OS the host is running.
• The department the host belongs to.
• What data the host contains.
• What cluster/round robin the host belongs to.

General host tags

These host tags are useful across the board. You can also develop lists of host tags that are
appropriate for specific apps.

Tag Explanation
db This host is a database.
development This host is a development box.
dmz This host is in the DMZ.
dns This host is a DNS server.
email This host is an email server.
finance This host contains financial information.
firewall This host is a firewall.
highly_critical This host is highly critical for business purposes.
web This host is a Web server.

24
Data interpretation: Fields and field extractions
About fields
About fields

Fields are searchable name/value pairings in event data. All fields have names and can be searched
with those names. ("Name/value pairings" are sometimes referred to as "key/value pairings.")

For example, look at the following search:

host=foo
In this search, host=foo is a way of indicating that you are searching for events with host fields
that have values of foo. When you run this search, Splunk won't seek out events with different host
field values. It also won't look for events containing other fields that share foo as a value. This
means that this search gives you a more focused set of search results than you might get if you just
put foo in the search bar.

As Splunk processes event data, it extracts and defines fields from that data, first at index time, and
again at search time. These fields show up in the Field Picker after you run a search.

At index time Splunk extracts a small set of default fields for each event, including host, source,
and sourcetype. Default fields are common to all events. Splunk can also extract custom indexed
fields at index time; these are fields that you have configured for index-time extraction.

At search time Splunk automatically extracts certain fields. It:

• automatically identifies and extracts the first 50 fields that it finds in the event data that match
obvious name/value pairs, such as user_id=jdoe or client_ip=192.168.1.1, which it
extracts as examples of user_id and client_ip fields. (This 50 field limit is a default that
can be modified by editing the [kv] stanza in limits.conf.)
• extracts any field explicitly mentioned in the search that it might otherwise have found though
automatic extraction (but isn't among the first 50 fields identified).
• performs custom search field extractions that you have defined, either through the Interactive
Field Extractor, the Extracted fields page in Manager, configuration file edits, or search
commands such as rex.

For an explanation of "search time" and "index time" see "Index time versus search time" in the
Admin manual.

An example of automatic field extraction

This is an example of how Splunk automatically extracts fields without user help (as opposed to
custom field extractions, which follow event-extraction rules that you define):

Say you search on sourcetype, a default field that Splunk automatically extracts for every event at
index time. If your search is

sourcetype=veeblefetzer

25
for the past 24 hours, Splunk retuns every event with a sourcetype of veeblefetzer in that time
range. From this set of events, Splunk automatically extracts the first 50 fields that it can identify on
its own. And it performs extractions of custom fields, based on configuration files. All of these fields
will appear in the Field Picker when the search is complete.

Now, if a name/value combination like userlogin=fail appears for the first time 25,000 events
into the search, and userlogin isn't among the set of custom fields that you've preconfigured, it
likely won't be among the first 50 fields that Splunk finds on its own.

However, if you change your search to

sourcetype=veeblefetzer userlogin=*
Then Splunk will be smart enough to find and return all events including both the userlogin field
and a sourcetype value of veeblefetzer, and it will be available in the Fields Picker along with
the other fields that Splunk has extracted for this search.

Add and maintain custom search fields

To fully utilize the power of Splunk IT search, however, you need to know how to create and maintain
custom search field extractions. Custom fields enable you to capture and track information that is
important to your needs, but which isn't being discovered and extracted by Splunk automatically.

As a knowledge manager, you'll oversee the set of custom search field extractions created by users
of your Splunk implementation, and you may define specialized groups of custom search fields
yourself. This section of the Knowledge Manager manual discusses the various methods of field
creation and maintenance (see the "Overview of search-time field extraction" topic) and provides
examples showing how this functionality can be used.

You'll learn how to:

• create and administrate search-time field extractions through Splunk Manager.


• design and manage search-time field transforms through Splunk Manager.
• use the props.conf and transforms.conf configuration files to add and maintain search-time
extractions .
• configure Splunk to parse multivalue fields.

Overview of search-time field extraction


Overview of search-time field extraction

This topic provides a brief overview of Splunk Web field extraction methods.

As you use Splunk, you will encounter situations that require the creation of new fields that will be
additions to the set of fields that Splunk automatically extracts for you at index time and search
time.

26
As a knowledge manager, you'll be managing field extractions for the rest of your team. In many
cases you'll be defining fields that Splunk has not identified on its own, in effort to make your event
data more useful for searches, reports, and dashboards. However, you may also want to define field
extractions as part of an event data normalizaton strategy, where you redefine existing fields and
create new ones in an effort to reduce redundancies and increase the overall usability of the fields
available to other Splunk users on their team. (For more information, see "Understand and use the
Common Information Model," in this manual.)

If you find that you need to create additional search-time field extractions, you have a number of ways
to go about it. Splunk Web provides a variety of search-time field extraction methods. The search
language also enables you to create temporary field extractions. And you can always add and
maintain field extractions by way of configuration file edits.

For a detailed discussion of search-time field addition using methods based in Splunk Web, see
"Extract and add new fields" in the User manual. We'll just summarize the methods in this subtopic
and provide links to topics with in-depth discussions and examples.

Use interactive field extraction to create new fields

You can create custom fields dynamically using the interactive field extractror (IFX) in Splunk Web.
IFX enables you to quickly turn any search into a field extracting regular expression. You use IFX on
the local indexer. For more information about using IFX, see "Extract fields interactively in Splunk
Web" in the User manual.

Note: IFX is especially useful if you are not familiar with regular expression syntax and usage,
because it will generate field extraction regexes for you (and enable you to test them).

To access IFX, run a search and then select "Extract fields" from the dropdown that appears beneath
timestamps in the field results. IFX enables you to extract only one field at a time (although you can
edit the regex it generates later to extract multiple fields).

Use Splunk Manager to add and maintain field extractions

You can use the Field extractions and Field transformations pages in Splunk Manager to review, edit,
and create extracted fields.

The Field extractions page

The Field extractions page shows you the search-time field extractions in props.conf. You can edit
existing extractions and create new ones. The Field extractions page allows you to review, update,
and create field extractions. You can use it to create and manage both basic "inline" search-time
extractions (extractions that are defined entirely within props.conf) and more advanced
search-time extractions that reference a field transformation component in transforms.conf. You can
define field transformations in Manager through the Field transformations page (see below).

27
In Splunk Web, you navigate to the Field extractions page by selecting Manager > Fields > Field
extractions.

For more information, see "Use the Field extractions page in Manager".

The Field transformations page

You can also use Manager to create more complex search-time field extractions that involve a
transform component in transforms.conf. To do this, you couple an extraction from the Field
extractions page with a field transform on the Field transformations page.

The Field transformations page displays search-time field transforms that have been defined in
transforms.conf. Field transforms work with extractions set up in props.conf to enable
advanced field extractions. With transforms, you can define field extractions that

• Reuse the same field-extracting regular expression across multiple sources, source types, or
hosts (in other words, configure one field transform for multiple field extractions).
• Apply more than one field-extracting regular expression to the same source, source type, or
host (in other words, apply multiple field transforms to the same field extraction).
• Use a regular expression to extract fields from the values of another field (also referred to as a
"source key").

In Splunk Web, you navigate to the Field transformations page by selecting Manager > Fields >
Field transformations.

For more information, see "Use the Field transformations page in Manager".

Configure field extractions in props.conf and transforms.conf

You can also create and maintain field extractions by making edits directly to props.conf and
transforms.conf. If this sounds like your kind of thing--and it may be, especially if you are an
old-timey Splunk user, or just prefer working at the configuration file level of things, you can find all
the details in "Create and maintain search-time extractions through configuration files," in this
manual.

It's important to note that the configuration files do enable you to do more things with search-time
field extractions than Manager currently does. For example with the config files you can you can set
up:

• Delimiter-based field extractions.


• Extractions for multivalue fields.
• Extractions of fields with names that begin with numbers or underscores (normally not allowed
unless key cleaning is disabled).
• Formatting of extracted fields.

28
Use search commands to create field extractions

Splunk provides a variety of search commands that facilitate the extraction of fields in different ways.
Here's a list of these commands:

• The rex search command performs field extractions using a Perl regular expression with
named groups named groups that you include in the search string.
• The extract (or kv, for "key/value") search command extracts field/value pairs from search
results. If you use extract without specifying any arguments, Splunk extracts fields using
field extraction stanzas that have been added to props.conf. You can use extract to test any
field extractions that you plan to add manually through conf files, to see if they extract
field/value information as expected.
• Use multikv to extract field/value pairs from multiline, tabular-formatted events. It creates a
new event for each table row and derives field names from the table title.
• xmlkv enables you to extract field/value pairs from xml-formatted event data, such as
transactions from webpages.
• kvform extracts field/value pairs from events based on predefined form templates that describe
how the values should be extracted. These templates are stored in
$SPLUNK_HOME/etc/system/form/, or your own custom application directory in
$SPLUNK_HOME/etc/apps/.../form. For example, if form=sales_order, Splunk
matches all of the events it processes against that form in an effort to extract values. When
Splunk encounters an event with error_code=404, it looks for a sales_order.form file.

For details about how these commands are used, along with examples, see either the Search
Reference or the "Extract and add new fields" topic in the User manual.

Use the Field extractions page in Manager


Use the Field extractions page in Manager

Use the Field extractions page in Manager to manage search-time field extractions that have been
added to props.conf. Field extractions can be added to props.conf when you use the Interactive
Field Extractor, through direct props.conf edits, and when you create field extractions through the
Field extractions page.

The Field extractions page enables you to:

• Review the overall set of search-time extractions that you have created or which your
permissions enable you to see, for all Apps in your instance of Splunk.
• Create new search-time field extractions.
• Update permissions for field extractions. Field extractions created through the Interactive Field
Extractor and the Field extractions page are initially only available to their creators until they
are shared with others.
• Delete field extractions, if your app-level permissions enable you to do so, and if they are not
default extractions that were delivered with the product. Default knowledge objects cannot be
deleted. For more information about deleting knowledge objects, see "Curate Splunk
knowledge with Manager" in this manual.

29
If you have "write" permissions for a particular search-time field extraction, the Field extractions
page enables you to:

• Update its regular expression, if it is an inline transaction.


• Add or delete named extractions that have been defined in transforms.conf or the Field
transactions page in Manager, if it uses transactions.

Note: You cannot manage index-time field extractions via Manager. We don't recommend that you
change your set of index-time field extractions, but if you find that you must do so, you have to modify
your props.conf and transforms.conf configuration files manually. For more information about
index-time field extraction configuration, see "Configure index-time field extractions" in the Getting
Data In manual.

Navigate to the Field extractions page by selecting Manager > Fields > Field extractions.

Review search-time field extractions in Manager

To better understand how the Field extractions page in Manager displays your field extraction, it
helps to understand how field extractions are set up in your props.conf and transforms.conf
files.

Field extractions can be set up entirely in props.conf, in which case they are identified on the Field
extractions page as inline field extractions. But some field extractions include a transforms.conf
component called a field transform. To create/edit that component of the field extraction via Splunk
Web, you use the Field transactions page in Manager.

For more information about transforms and the Field transforms page, see "Manage field transforms"
in this manual.

For more information about field extraction setup directly in the props.conf and transforms.conf files
see "Add fields at search time" in this manual.

Name column

The Name column in the Field extractions page displays the overall name of the field extraction, as it
appears in props.conf. The format is:

<spec> : [EXTRACT-<name> | REPORT-<name>]

• <spec> can be:


♦ <sourcetype>, the source type of an event.
♦ host::<host>, where <host> is the host for an event.
♦ source::<source>, where <source> is the source for an event.

EXTRACT-<name> field extractions are extractions that are wholly defined in props.conf (in other
words, they do not reference a transform in transforms.conf. They are created automatically by field
extractions made through IFX and certain search commands. You can also add them by making
direct updates to the props.conf file. This kind of extraction is always associated with a

30
field-extracting regular expression. On the Field extractions page, this regex appears in the
Extraction/Transform column.

REPORT-<value> field extractions reference field transform stanzas in transforms.conf. This is


where their field-extracting regular expressions are located. On the Field extractions page, the
referenced field transform stanza is indicated in the "Extraction/Transform" stanza.

You can work with transforms in Manager through the Field transforms page. For more information
see "Use the Field Transformations page in Manager" in this manual.

Type column

There are two field extraction types: inline and transforms.conf.

• Inline extractions always have EXTRACT-<name> configurations. They are identified as such
because they are entirely defined within </code>props.conf</code>; they do not reference
external field transforms.
• Uses transform extractions always have REPORT-<value> name configurations. As such they
reference field transforms in </code>transforms.conf</code>. You can define field transforms
directly in transforms.conf or via Manager using the Field transformations page.

Extraction Transform column

In the Extraction/Transform column, Manager displays different things depending on the field
extraction Type.

• For inline extraction types, Manager displays the regular expression that Splunk uses to
extract the field. The named group (or groups) within the regex show you what field(s) it
extracts.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can
test your regex by using it in a search with the rex search command. Splunk also maintains a
list of useful third-party tools for writing and testing regular expressions.

• In the case of Uses transform extraction types, Manager displays the name of the
transforms.conf field transform stanza (or stanzas) that the field extraction is linked to
through props.conf. A field extraction can reference multiple field transforms if you want to
apply more than one field-extracting regex to the same source, source type, or host. This can
be necessary in cases where the field or fields that you want to extract appear in two or more
very different event patterns.

For example, the Expression column could display two values for a Uses transform extraction:
access-extractions and ip-extractions. These may appear in props.conf as:

[access_combined]
REPORT-access = access-extractions, ip-extractions

In this example, access-extractions and ip-extractions are both names of field


transform stanzas in transforms.conf. To work with those field transforms through
Manager, go to the Field transforms page.

31
Add new field extractions

Click the New button at the top of the Field extractions page to add a new field extraction. The Add
New page appears.

If you know how field extractions are set up in props.conf, you should find this to be pretty simple.

All of the fields described below are required.

1. Define a Destination app context for the field extraction. By default it will be the app context you
are currently in.

2. Give the field extraction a Name, using underscores for spaces between words. In props.conf
this is the <name> value for an EXTRACT or REPORT field extraction type.

3. Define the sourcetype, source, or host to which the extraction applies. Select sourcetype, source,
or host and enter the value. This maps to the <spec> value in props.conf.

4. Define the extraction type. If you select Uses transform enter the transform(s) involved in the
Extraction/Transform field. If you select Inline enter the regular expression used to extract the field
(or fields) in the Extraction/Transform field.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test
your regex by using it in a search with the rex search command. Splunk also maintains a list of useful
third-party tools for writing and testing regular expressions.

Important: The capturing groups in your regex must identify field names that only contain
alpha-numeric characters or underscores.

• Valid characters for field names are a-z, A-Z, 0-9, or _ .


• Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's
internal variables.
• International characters are not allowed.

Splunk applies the following "key cleaning" rules to all extracted fields, either by default or through a
custom configuration:

• All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).
• All leading underscores and 0-9 characters are removed from extracted field names.

To disable this behavior for a specific field extraction, you have to manually modify both props.conf
and transforms.conf. For more information, see "Create and maintain search-time field
extractions through configuration files" in this manual.

Note: You cannot turn off key cleaning for inline field extractions (field extractions that do not require
a field transform component).

32
Example - Add a new error code field

This shows how you would define an extraction for a new err_code field. The field can be identified
by the occurrence of device_id= followed by a word within brackets and a text string terminating
with a colon. The field should be extracted from events related to the testlog source type.

In props.conf this extraction would look like:

[testlog]
EXTRACT-<errors> = device_id=\[w+\](?<err_code>[^:]+)

Here's how you would set that up through the Add new field extractions page:

Note: You can find a version of this example in "Create and maintain search-time field extractions"
topic in this manual, which shows you how to set up field extractions using the props.conf file.

Update existing field extractions

To edit an existing field extraction, click locate the field extraction and click its name in the Name
column.

This takes you to a details page for that field extraction. In the Extraction/Transform field what you
can do depends on the type of extraction that you are working with.

• If the field extraction is an inline extraction, you can edit the regular expression it uses to
extract fields.

33
• If the field extraction uses one or more transforms, you can specify the transform or transforms
involved (put them in a comma-separated list if there is more than one.) The transforms can
then be created or updated via the Field transforms page.

Note: Uses transform field extractions must include at least one valid transforms.conf field
extraction stanza name.

Update field extraction permissions

When a field extraction is created through an inline method (such as IFX or a search command) it is
initially only available to its creator. To make it so that other users can use the field extraction, you
need to update its permissions. To do this, locate the field extraction on the Field extractions page
and select its Permissions link. This opens the standard permission management page used in
manager for knowledge objects.

On this page you can set up role-based permissions for the field extraction, and determine whether it
is available to users of one specific App, or globally to users of all Apps. For more information about
managing permissions with Manager, see "Curate Splunk knowledge with Manager," in this manual.

Delete field extractions

On the Field extractions page in Manager, you can delete field extractions if your permissions enable
you to do so. You won't be able to delete default field extractions (extractions that were delivered with
the product and which are stored in the "default" directory of an app).

Click Delete for the field extraction that you want to remove.

Note: Take care when deleting objects that have downstream dependencies. For example, if your
field extraction is used in a search that in turn is the basis for an event type that is used by five other
saved searches (two of which are the foundation of dashboard panels), all of those other knowledge
objects will be negatively impacted by the removal of that extraction from the system. For more
information about deleting knowledge objects, see "Curate Splunk knowledge with Manager" in this
manual.

Use the Field transformations page in Manager


Use the Field transformations page in Manager

The Field transformations page in Manager enables you to manage the "transform" components of
search-time field extractions, which reside in transforms.conf. Field transforms can be created
either through direct edits to transforms.conf or by addition through the Field transformations
page.

Note: Every field transform has at least one field extraction component. But "inline" field extractions
do not need to have a field transform component.

34
The Field transformations page enables you to:

• Review the overall set of field transforms that you have created or which your permissions
enable you to see, for all Apps in your instance of Splunk.
• Create new search-time field transforms. For more information about situations that call for the
use of field transforms, see "When to use the Field transformations page," below.
• Update permissions for field transforms. Field transforms created through the Field
transformations page are initially only available to their creators until they are shared with
others. You can only update field transform permissions if you own the transform, or if your
role's permissions enable you to do so.
• Delete field transforms, if your app-level permissions enable you to do so, and if they are not
default field transforms that were delivered with the product. Default knowledge objects cannot
be deleted. For more information about deleting knowledge objects, see "Curate Splunk
knowledge with Manager" in this manual.

If you have "write" permissions for a particular field transform, the Field transformations page enables
you to:

• Update its regular expression and change the key the regular expression applies to.
• Define or update the field transform format.

Navigate to the Field transformations page by selecting Manager > Fields > Field transforms.

When to use the Field transformations page

While you can define most search-time field extractions entirely within props.conf (or the Field
extractions page in Manager), some advanced search-time field extractions require a
transforms.conf component called a field transform. This component can be defined and
managed through the Field transforms page.

You set up search-time field extractions with a field transform component when you need to:

• Reuse the same field-extracting regular expression across multiple sources, source
types, or hosts (in other words, configure one field transform for multiple field extractions). If
you find yourself using the same regex to extract fields for different sources, source types, and
hosts, you may want to set it up as a transform. Then, if you find that you need to update the
regex, you only have to do so once, even though it is used more than one field extraction.
• Apply more than one field-extracting regular expression to the same source, source
type, or host (in other words, apply multiple field transforms to the same field extraction). This
is sometimes necessary in cases where the field or fields that you want to extract from a
particular source/source type/host appear in two or more very different event patterns.
• Use a regular expression to extract fields from the values of another field (also referred
to as a "source key"). For example, you might pull a string out of a url field value, and have
that be a value of a new field.

Note: All index-time field extractions are coupled with one or more field transforms. You cannot
manage index-time field extractions via Manager, however--you have to use the props.conf and

35
transforms.conf configuration files. For more information about index-time field extraction
configuration, see "Configure index-time field extractions" in the Admin manual.

It's also important to note that you can do more things with search-time field transforms (such as
setting up delimeter based field extractions and configuring extractions for multivalued fields) if you
configure them directly within transforms.conf. See the section on field transform setup in
"Create and maintain search-time field extractions through configuration files" in this manual for more
information.

Review and update search-time field transforms in Manager

To better understand how the Field transformation page in Manager displays your field transforms, it
helps to understand how search-time field extractions are set up in your props.conf and
transforms.conf files.

A typical field transform looks like this in transforms.conf:

[banner]
REGEX = /js/(?<license_type>[^/]*)/(?<version>[^/]*)/login/(?<login>[^/]*)
SOURCE_KEY = uri

This transform matches its regex against uri field values, and extracts three fields as named groups:
license_type, version, and login.

In props.conf, that transform is matched to the source .../banner_access_log* like so:

[source::.../banner_access_log*]
REPORT-banner = banner

This means the regex is only matched to uri fields in events coming from the
.../banner_access_log source.

But you can match it to other sources, sourcetypes, and hosts if necessary. This is something you
can't do with inline field extractions (field extractions set up entirely within props.conf).

Note: By default, transforms are matched to a SOURCE_KEY value of _raw, in which case their
regexes are applied to the entire event, not just fields within that event.

The Name column

The Name column of the Field transformations page displays the names of the search-time
transforms that your permissions enable you to see. These names are the stanza names in
transforms.conf. The transform example presented above appears in the list of transforms as
banner.

Click on a transform name to see the detail information for that particular transform.

Reviewing and editing transform details

The details page for a field transform enables you to view and update its regular expression, key, and
event format. Here's the details page for the banner transform that we described at the start of this

36
subtopic:

If you have the permissions to do so, you can edit the regex, key, and event format. Keep in mind that
these edits can affect multiple field extractions defined in props.conf and the Field extractions
page, if the transform has been applied to more than one source, sourcetype, or host.

Create a new field transform

To create a new field transform:

1. First, navigate to the Field transformations page and click the New button.

2. Identify the Destination app for the field transform, if it is not the app you are currently in.

3. Give the field transform a Name. This equates to the stanza name for the transform on
transforms.conf. When you save this transform this is the name that appears in the Name
column on the Field transformations page. (This is a required field.)

4. Enter a Regular expression for the transform. (This is a required field.)

5. Optionally define a Key for the transform. This corresponds to the SOURCE_KEY option in
transforms.conf. By default it is set to _raw, which means the regex is applied to entire events.
To have the regex be applied to values of a specific field, replace _raw with the name of that field.
You can only use fields that are present when the field transform is executed.

6. Optionally specify the Event format. This corresponds to the FORMAT option in
transforms.conf. For example, you could have an event that contains strings for a field name and
its corresponding field value. You first design a regex that extracts those strings, and then you use
the FORMAT of $1::$2 to have the first string be the field name, and the second string be the field
value.

Regular expression syntax and usage

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test
your regex by using it in a search with the rex search command. Splunk also maintains a list of useful
third-party tools for writing and testing regular expressions.

37
Important: The capturing groups in your regex must identify field names that contain alpha-numeric
characters or an underscore.

• Valid characters for field names are a-z, A-Z, 0-9, or _ .


• Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's
internal variables.
• International characters are not allowed.

Splunk applies the following "key cleaning" rules to all extracted fields when they are extracted at
search-time, either by default or through a custom configuration:

1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).

2. When key cleaning is enabled (it is enabled by default), Splunk removes all leading underscores
and 0-9 characters from extracted fields.

To disable this behavior for a specific field extraction, you have to manually modify both props.conf
and transforms.conf. For more information, see "Create and maintain search-time field
extractions through configuration files" in this manual.

Note: You cannot turn off key cleaning for inline field extractions (field extractions that do not require
a field transform component).

Example - Extract both field names and their corresponding field values from an event

You can use the Event format attribute in conjunction with a properly designed regular expression to
set up field transforms that extracts both a field name and its corresponding field value from each
matching event.

Here's an example, using a transform that is delivered with Splunk.

The bracket-space field transform has a regular expression that finds field name/value pairs within
brackets in event data. It will reapply this regular expression until all of the matching field/value pairs
in an event are extracted.

As we stated earlier in this topic, field transforms are always associated with a field extraction. On the
Field Extractions page in Manager, you can see that the bracket-space field transform is

38
associated with the osx-asl:REPORT-asl extraction.

Update field transform permissions

When a field transform is first created, by default it is only available to its creator. To make it so that
other users can use the field transform, you need to update its permissions. To do this, locate the
field transform on the Field transformations page and select its Permissions link. This opens the
standard permission management page used in Manager for knowledge objects.

On this page you can set up role-based permissions for the field transform, and determine whether it
is available to users of one specific App, or globally to users of all Apps. For more information about
managing permissions with Manager, see "Curate Splunk knowledge with Manager," in this manual.

Delete field transforms

On the Field transformations page in Manager, you can delete field transforms if your permissions
enable you to do so.

Click Delete for the field extraction that you want to remove.

Note: Take care when deleting knowledge objects that have downstream dependencies. For
example, if the field extracted by your field transform is used in a search that in turn is the basis for an
event type that is used by five other saved searches (two of which are the foundation of dashboard
panels), all of those other knowledge objects will be negatively impacted by the removal of that
transform from the system. For more information about deleting knowledge objects, see "Curate
Splunk knowledge with Manager" in this manual.

Create and maintain search-time field extractions through


configuration files
Create and maintain search-time field extractions through configuration files

While you can now set up and manage search-time field extractions via Splunk Manager, it's
important to understand how they are handled at the props.conf and transforms.conf level,
because those are the configuration files that the Field extractions and Field transforms pages in
Manager read from and write to.

Many knowledge managers, especially those who have been using Splunk for some time, find it
easier to manage their custom fields through configuration files, which can be used to add, maintain,
and review libraries of custom field additions for their teams.

This topic shows you how you can:

• Set up basic "inline" search-time field extractions through edits to props.conf.

39
• Design more complex search-time field extractions through a combination of edits to
props.conf and transforms.conf.

Regular expressions and field name syntax

Splunk uses regular expressions, or regexes, to extract fields from event data. When you use the
interactive field extractor (IFX), Splunk attempts to generate regexes for you, but it can only create
regular expressions that extract one field. If you set up your regular expressions manually, you can
design them so that they extract two or more fields from matching events if necessary.

On the other hand, when you set up field extractions manually through configuration files, you have to
provide the regex yourself--but you can set up regexes that extract two or more fields at once if
necessary.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test
your regex by using it in a search with the rex search command. Splunk also maintains a list of useful
third-party tools for writing and testing regular expressions.

Important: The capturing groups in your regex must identify field names that contain alpha-numeric
characters or an underscore. See "When Splunk creates field names," above.

Use proper field name syntax

Splunk only accepts field names that contain alpha-numeric characters or an underscore:

• Valid characters for field names are a-z, A-Z, 0-9, or _ .


• Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's
internal variables.
• International characters are not allowed.

Splunk applies the following "key cleaning" rules to all extracted fields when they are extracted at
search-time, either by default or through a custom configuration:

1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_). You
can disable this by setting CLEAN_KEYS=false in the transforms.conf stanza.

2. When key cleaning is enabled (it is enabled by default), Splunk removes all leading underscores
and 0-9 characters from extracted fields.

You can disable key cleaning for a particular field transform by setting CLEAN_KEYS=false in the
transforms.conf stanza for the extraction.

Create basic search-time field extractions with props.conf edits

You can create basic search-time field extractions by editing the props.conf configuration file. You
can find props.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application

40
directory in $SPLUNK_HOME/etc/apps/. (We recommend using the latter directory if you want to
make it easy to transfer your data customizations to other search servers.)

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

For more information on configuration files in general, see "About configuration files" in the Admin
manual.

Steps for defining basic custom field extractions with props.conf

1. All extraction configurations in props.conf are restricted by a specific source, source type, or
host. Start by identifying the sourcetype, source, or host that provide the events from which you
would like your field to be extracted.

Note: For information hosts, sources, and sourcetypes, see "About default fields (host, source,
source type, and more)" in the Admin manual.

2. Determine a pattern to identify the field in the event.

3. Write a regular expression to extract the field from the event.

4. Add your regex to props.conf and link it to the source, source type, or host that you identified in
the first step.

5. If your field value is a portion of a word, you must also add an entry to fields.conf. See the
example "create a field from a subtoken" below.

Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application


directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

5. Restart Splunk for your changes to take effect.

Add a regex stanza to props.conf

Follow this format when adding a field extraction stanza to props.conf:

[<spec>]
EXTRACT-<class> = <regular expression>

• <spec> can be:


♦ <sourcetype>, the source type of an event.
♦ host::<host>, where <host> is the host for an event.
♦ source::<source>, where <source> is the source for an event.
• <class> is the extraction class. Precedence rules for classes:
♦ For each class, Splunk takes the configuration from the highest precedence
configuration block.
♦ If a particular class is specified for a source and a sourcetype, the class for source
wins out.

41
♦ Similarly, if a particular class is specified in ../local/ for a <spec>, it overrides that
class in ../default/.
• <regular_expression> = create a regex that recognizes your custom field value. The
regex is required to have named capturing groups; each group represents a different extracted
field.

Note: Unlike the procedure for configuring the default set of fields that Splunk extracts at index time,
transforms.conf requires no DEST_KEY since nothing is being written to the index during
search-time field extraction. Fields extracted at search time are not persisted in the index as keys.

Note: For "inline" search-time field extractions, which are defined entirely within props.conf,
props.conf uses EXTRACT-<class>. When transforms are involved, this changes. Search-time
field extractions using transforms use REPORT-<value> (see the section on complex field
extractions for more info). And index-time field extractions, which are always constructed both in
props.conf and transforms.conf, use TRANSFORMS-<value>.

Splunk follows precedence rules when it runs search-time field extractions. It runs inline field
extractions (EXTRACT-<class>) first, and then runs field extractions that involve transforms
(REPORT-<class>).

Inline (props.conf only) search-time field extraction examples

Here are a set of examples of search-time custom field extraction, set up using props.conf only.

Add a new error code field

This example shows how to create a new "error code" field by configuring a field extraction in
props.conf. The field can be identified by the occurrence of device_id= followed by a word
within brackets and a text string terminating with a colon. The field should be extracted from events
related to the testlog source type.

In props.conf, add:

[testlog]
EXTRACT-<errors> = device_id=\[w+\](?<err_code>[^:]+)

Extract multiple fields using one regex

This is an example of a field extraction that pulls out five separate fields. You can then use these
fields in concert with some event types to help you find port flapping events and report on them.

Here's a sample of the event data that the fields are being extracted from:

#%LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet9/16,


changed state to down

The stanza in props.conf for the extraction looks like this:

[syslog]

42
EXTRACT-<port_flapping> = Interface\s(?<interface>(?<media>[^\d]+)(?<slot>\d+)\/(?<port>\d+))\,
\sstate\sto\s(?<port_status>up|down)

Note that five separate fields are extracted as named groups: interface, media, slot, port, and
port_status.

The following two steps aren't required for field extraction--they show you what you might do with the
extracted fields to find port flapping events and then report on them.

Use tags to define a couple of event types in eventtypes.conf:

[cisco_ios_port_down]
search = "changed state to down"

[cisco_ios_port_up]
search = "changed state to up"

Finally, create a saved search in savedsearches.conf that ties much of the above together to find port
flapping and report on the results:

[port flapping]
search = eventtype=cisco_ios_port_down OR eventtype=cisco_ios_port_up starthoursago=3 | stats c
interface,host,port_status | sort -count

Create a field from a subtoken

If your field value is a smaller part of a token, you must add an entry to field.conf. For example, your
field's value is "123" but it occurs as "foo123" in your event.

Configure props.conf as explained above. Then, add an entry to fields.conf:

[<fieldname>]
INDEXED = False
INDEXED_VALUE = False

• Fill in <fieldname> with the name of your field.


♦ For example, [url] if you've configured a field named "url."
• Set INDEXED and INDEXED_VALUE to false.
♦ This tells Splunk that the value you're searching for is not a token in the index.

Create advanced search-time field extractions with field transforms

While you can define most search-time field extractions entirely within props.conf, some advanced
search-time field extractions require an additional component called a field transform. This section
shows you how to configure field transforms in transforms.conf.

Field transforms contain a field-extracting regular expression and other attributes that govern the way
the transform extracts fields. Field transforms are always created in conjunction with field extraction
stanzas in props.conf--they cannot stand alone.

43
Your search-time field extractions require a field transform component if you need to:

• Reuse the same field-extracting regular expression across multiple sources, source
types, or hosts (in other words, configure one field transform for multiple field extractions). If
you find yourself using the same regex to extract fields for different sources, source types, and
hosts, you may want to set it up as a transform. Then, if you find that you need to update the
regex, you only have to do so once, even though it is used more than one field extraction.
• Apply more than one field-extracting regular expression to the same source, source
type, or host (in other words, apply multiple field transforms to the same field extraction). This
is sometimes necessary in cases where the field or fields that you want to extract from a
particular source/source type/host appear in two or more very different event patterns.
• Set up delimiter-based field extractions. Delimiter-based extractions come in handy when
your event data presents field-value pairs (or just field values) that are separated by delimiters
such as commas, colons, bars, line breaks, and tab spaces.
• Configure extractions for multivalued fields. When you do this, Splunk appends additional
field values to the field as it finds them in the event data.
• Extract fields with names that begin with numbers or underscores. Ordinarily key
cleaning removes leading numeric characters and underscores from field names, but you can
configure your transform to turn this functionality off if necessary.

You can also configure transforms to:

• Extract fields from the values of another field (other than _raw) by using the SOURCE_KEY
attribute.
• Apply special formatting to the information being extracted, by using the FORMAT attribute.

Both of these configurations can now be set up directly in the regex, however; see the "Define a field
transform" section below for more information about how to do this.

NOTE: If you need to concatenate a set of regex extractions into a single field value, you can do this
with the FORMAT attribute, but only if you set it up as an index-time extraction. For example, if you
have a string like 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ip
address field value in the format 192.0.2.1. For more information, see "Configure index-time field
extractions" in the Admin Manual. However we DO NOT RECOMMEND that you make extensive
changes to your set of indexed fields--do so sparingly if at all.

Steps for defining custom search-time field extractions with field transforms

1. All extraction configurations in props.conf are restricted by a specific source, source type, or
host. Start by identifying the source, source type, or host that provide the events from which you
would like your field to be extracted.

Note: For more information about sources, source types, or hosts, see "About default fields (host,
source, sourcetype, and more)" in the Admin manual.

2. Determine a pattern to identify the field in the event.

3. Define a regular expression that uses this pattern to extract the field from the event. (Note: If your
event lists field/value pairs or just field values, you can create a delimiter-based field extraction that

44
won't require a regex; see the information on the DELIMS attribute, below, for more information.)

4. Create a field transform in transforms.conf that utilizes this regex (or delimiter configuration).
The transform can also define a source key and/or event value formatting.

Edit the transforms.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom


application directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

5. In props.conf, create a field extraction stanza that is linked to the host, source, or source type
that you identified in step 1. Add a reference to the transform you defined in transforms.conf.
(Create additional field extraction stanzas for other hosts, sources, and source types that refer to the
same transform if necessary.)

Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application


directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

6. Restart Splunk for your changes to take effect.

First, define a field transform

Follow this format when defining a search-time field transform in transforms.conf:

[<unique_stanza_name>]
REGEX = <regular expression>
SOURCE_KEY = <string>
FORMAT = <string>
DELIMS = <quoted string list>
FIELDS = <quoted string list>
MV_ADD = <bool>
CLEAN_KEYS = <bool>

• The <unique_stanza_name> is required for all search-time transforms.


• REGEX is a regular expression that operates on your data to extract fields. It is required for all
search-time field transforms unless you are setting up a delimiter-based transaction, in which
case you use DELIMS instead.
♦ Name-capturing groups in the REGEX are extracted directly to fields, which means that
you don't have to specify FORMAT for simple field extraction cases.
♦ If the REGEX extracts both the field name and its corresponding value, you can use the
following special capturing groups to skip specifying the mapping in FORMAT:

<_KEY_><string>, <_VAL_><string>.

• For example, the following are equivalent:

Using FORMAT:

REGEX = ([a-z]+)=([a-z]+)

45
FORMAT = $1::$2

Not using FORMAT:

REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

• SOURCE_KEY is optional. Use it to identify a field whose values the transform regex should be
applied to.
♦ By default, SOURCE_KEY is set to _raw, which means it is applied to the entire event.
♦ For search-time transforms, the key can be any field that is present at the time that the
field transform is executed.

• FORMAT is optional. Use it to specify the format of the field/value pair(s) that you are extracting,
including any field names or values you want to add. You don't need to specify the FORMAT if
you have a simple REGEX with name-capturing groups.
♦ Defaults to an empty string.
♦ For search-time transforms, this is the pattern for the FORMAT field:

FORMAT = <field-name>::<field-value>(
<field-name>::<field-value>)*
where:
field-name = <string>|$<extracting-group-number>
field-value = <string>|$<extracting-group-number>

Examples of search-time FORMAT usage:


1. FORMAT = first::$1 second::$2 third::other-value
2. FORMAT = $1::$2 $4::$3

• DELIMS is optional. Use it in place of REGEX when dealing with delimiter-based field
extractions, where field values--or field/value pairs--are separated by delimiters such as
commas, colons, spaces, tab spaces, line breaks, and so on.
♦ Delimiters must be quoted with " " (use \ to escape).
♦ If the event contains full delimiter-separated field/value pairs, you enter two sets of
quoted delimiters for DELIMS. The first set of quoted delimiters separates the field/value
pairs. The second set of quoted delimiters separates the field name from its
corresponding value.
♦ If the events only contain delimiter-separated values (no field names), you use one set
of quoted delimiters, to separate the values. Then you use the FIELDS attribute to
apply field names to the extracted values (see FIELDS below). Alternately, Splunk
reads even tokens as field names and odd tokens as field values.
♦ Splunk consumes consecutive delimiter characters unless you specify a list of field
names.
♦ This example of DELIMS usage applies to an event where field/value pairs are
separated by '|' symbols, and the field names are separated from their corresponding
values by '=' symbols:

[pipe_eq]
DELIMS = "|", "="

46
• FIELDS is used in conjunction with DELIMS when you are performing delimiter-based field
extraction, but you only have field values to extract. Use FIELDS to provide field names for the
extracted field values, in list format according to the order in which the values are extracted.
♦ Note: If field names contain spaces or commas they must be quoted with " " (to escape,
use \).
♦ Here's an example of a delimiter-based extraction where three field values appear in an
event. They are separated by a comma and then a space.

[commalist]
DELIMS = ", "
FIELDS = field1, field2, field3

• MV_ADD is optional. Use it when you have events that repeat the same field but with different
values. When MV_ADD = true, Splunk makes any field that is used more than once in an
event (but with different values) a multivalued field and appends each value it finds for that
field.
♦ When set to false, Splunk keeps the first value found for a field in an event and
discards every subsequent value found for that same field in that same event.

• CLEAN_KEYS is optional. It controls whether or not the system strips leading underscores and
0-9 characters from the field names it extracts (see the subtopic "Use proper field name
syntax," above, for more information).
♦ By default, CLEAN_KEYS is always set to true for transforms.
♦ Add CLEAN_KEYS = false to your transform if you need to extract field names (keys)
with leading underscores and/or 0-9 characters.

Second, configure a field extraction and associate it with the field transform

Follow this format when you're associating a search-time field transform with a field extraction stanza
in props.conf. <unique_transform_stanza_name> is the name of the field transform stanza
that you are associating with the field extraction.

You can associate multiple field transform stanzas to a single field extraction by listing them after the
initial <unique_transform_stanza_name>, separated by commas. (For more information, see
the example later in this topic.)

[<spec>]
REPORT-<value> = <unique_transform_stanza_name>

• <spec> can be:


♦ <sourcetype>, the source type of an event.
♦ host::<host>, where <host> is the host for an event.
♦ source::<source>, where <source> is the source for an event.
• <class> is the extraction class. Precedence rules for classes:
♦ For each class, Splunk takes the configuration from the highest precedence
configuration block.
♦ If a particular class is specified for a source and a sourcetype, the class for source
wins out.
♦ Similarly, if a particular class is specified in ../local/ for a <spec>, it overrides that
class in ../default/.

47
• <unique_transform_stanza_name> is the name of your field transform stanza from
transforms.conf.
• <value> is any value you want to give to your stanza to identify its name-space.
• Transforms are applied in the specified order.
• If you need to change the order, control it by rearranging the list.

Note: Index-time field transactions use TRANSFORM-<value> =


<unique_transform_stanza_name>. For more information, see "Configure index-time field
extractions" in the Admin Manual.

Examples of custom search-time field extractions using field transforms

These examples present custom field extraction use cases that require you to configure one or more
field transform stanzas in transforms.conf and then reference them in a props.conf field
extraction stanza.

Configuring a field extraction that utilizes multiple field transforms

This example of search-time field transform setup demonstrates how:

• you can create transforms that pull varying field name/value pairs from events.
• you can create a field extraction that references two or more field transforms.

Let's say you have logs that contain multiple field name/field value pairs. While the fields vary from
event to event, the pairs always appear in one of two formats.

The logs often come in this format:

[fieldName1=fieldValue1] [fieldName2=fieldValue2]

However, at times they are more complicated, logging multiple name/value pairs as a list, in which
case the format looks like:

[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2]


[headerValue=fieldValue2]

Note that the list items are separated by commas, and that each fieldName is matched with a
corresponding fieldValue. In these secondary cases you still want to pull out the field names and
values so that the search results are

fieldName1=fieldValue1
fieldName2=fieldValue2

and so on.

To make things more clear, here's an example of an HTTP request event that combines both of the
above formats.

[method=GET] [IP=10.1.1.1] [headerName=Host]


[headerValue=www.example.com], [headerName=User-Agent]

48
[headerValue=Mozilla], [headerName=Connection] [headerValue=close]
[byteCount=255]

You want to develop a single field extraction that would pull the following field/value pairs from that
event:

method=GET
IP=10.1.1.1
Host=www.example.com
User-Agent=Mozilla
Connection=close
byteCount=255

Solution

To efficiently and reliably pull out both formats of field/value pairs, you'll want to design two different
regexes that are optimized for each format. One regex will identify events with the the first format and
pull out all of the matching field/value pairs. The other regex will identify events with the other format
and pull out those field/value pairs.

You then create two unique transforms in transforms.conf--one for each regex--and then unite
them in the corresponding field extraction stanza in props.conf.

The first transform you add to transforms.conf catches the fairly conventional
<code>[fieldName1=fieldValue1] [fieldName2=fieldValue2]</code> case.

[myplaintransform]
REGEX=\[(?!(?:headerName|headerValue))([^\s\=]+)\=([^\]]+)\]
FORMAT=$1::$2

The second transform (also added to transforms.conf) catches the slightly more complex
[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2]
[headerValue=fieldValue2] case:

[mytransform]
REGEX= \[headerName\=(\w+)\],\s\[headerValue=([^\]]+)\]
FORMAT= $1::$2

Both transforms use the <fieldName>::<fieldValue> FORMAT to match each field name in the
event with its corresponding value. This setting in FORMAT enables Splunk to keep matching the
regex against a matching event until every matching field/value combination is extracted.

Finally, this field extraction stanza, which you create in props.conf, references both of the field
transforms:

[mysourcetype]
KV_MODE=none
REPORT-a = mytransform, myplaintransform

Note that, besides using multiple field transforms, the field extraction stanza also sets
KV_MODE=none. This disables automatic field/value extraction for the identified source type (while
letting your manually defined extractions continue). It ensures that these new regexes aren't

49
overridden by automatic field extraction, and it also helps increase your search performance. (See
the following subsection for more on disabling key/value extraction.)

Configuring delimiter-based field extraction

You can use the DELIMS attribute in field transforms to configure field extractions for events where
field values or field/value pairs are separated by delimiters such as commas, colons, tab spaces, and
more.

For example, say you have a recurring multiline event where a different field/value pair sits on a
separate line, and each pair is separated by a colon followed by a tab space. Here's a sample event:

ComponentId: Application Server


ProcessId: 5316
ThreadId: 00000000
ThreadName: P=901265:O=0:CT
SourceId: com.ibm.ws.runtime.WsServerImpl
ClassName:
MethodName:
Manufacturer: IBM
Product: WebSphere
Version: Platform 7.0.0.7 [BASE 7.0.0.7 cf070942.55]
ServerName: sfeserv36Node01Cell\sfeserv36Node01\server1
TimeStamp: 2010-04-27 09:15:57.671000000
UnitOfWork:
Severity: 3
Category: AUDIT
PrimaryMessage: WSVR0001I: Server server1 open for e-business
ExtendedMessage:

Now you could set up a bulky, wordy search-time field extraction stanza in props.conf that handles
all of these fields:

[activityLog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
EXTRACT-ComponentId = ComponentId:\t(?.*)
EXTRACT-ProcessId = ProcessId:\t(?.*)
EXTRACT-ThreadId = ThreadId:\t(?.*)
EXTRACT-ThreadName = ThreadName:\t(?.*)
EXTRACT-SourceId = SourceId:\t(?.*)
EXTRACT-ClassName = ClassName:\t(?.*)
EXTRACT-MethodName = MethodName:\t(?.*)
EXTRACT-Manufacturer = Manufacturer:\t(?.*)
EXTRACT-Product = Product:\t(?.*)
EXTRACT-Version = Version:\t(?.*)
EXTRACT-ServerName = ServerName:\t(?.*)
EXTRACT-TimeStamp = TimeStamp:\t(?.*)
EXTRACT-UnitOfWork = UnitOfWork:\t(?.*)
EXTRACT-Severity = Severity:\t(?.*)
EXTRACT-Category = Category:\t(?.*)
EXTRACT-PrimaryMessage = PrimaryMessage:\t(?.*)
EXTRACT-ExtendedMessage = ExtendedMessage:\t(?.*)

But that solution is pretty over-the-top. Is there a more elegant way to handle it that would remove the
need for all these EXTRACT lines? Yes!

50
Configure the following stanza in transforms.conf:

[activity_report]
DELIMS = "\n", ":\t"

This states that the field/value pairs in the event are on separate lines ("\n"), and then specifies that
the field name and field value on each line is separated by a colon and tab space (":\t").

To complete this configuration, rewrite the wordy props.conf stanza mentioned above as:

[activitylog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
REPORT-activity = activity_report

These two brief configurations will extract the same set of fields as before, but they leave less room
for error and are more flexible.

Handling events with multivalued fields

You can use the MV_ADD attribute to extract fields in situations where the same field is used more
than once in an event, but has a different value each time. Ordinarily, Splunk only extracts the first
occurrence of a field in an event; every subsequent occurrence is discarded. But when MV_ADD is
set to true in transforms.conf, Splunk treats the field like a multivalue field and saves extracts
each unique field/value pair in the event.

Say you have a set of events that look like this:

event1.epochtime=1282182111 type=type1 value=value1 type=type3 value=value3


event2.epochtime=1282182111 type=type2 value=value4 type=type3 value=value5 type=type4 value=va

See how the type and value fields are repeated several times in each event? What you'd like to do
is search type=type3 and have both of these events be returned. Or you'd like to run a
count(type) report on these two events that returns 5.

So, what you want to do is create a custom multivalue extraction of the type field for these events.
Here's how you would set up your transforms.conf and props.conf files to enable it:

First, transforms.conf:

[mv-type]
REGEX = type=(?<type>\s+)
MV_ADD = true

Then, in props.conf for your sourcetype or source, set:

REPORT=type = mv-type

51
Disabling automatic search-time extraction for specific sources, source types, or hosts

You can disable automatic search-time field extraction for specific sources, source types, or hosts
through edits in props.conf. Add KV_MODE = none for the appropriate [<spec>] in
props.conf.

Note: Custom field extractions set up manually via the configuration files or Manager will still be
processed for the affected source, source type, or host when KV_MODE = none.

[<spec>]
KV_MODE = none

<spec> can be:

• <sourcetype> - an event source type.


• host::<host>, where <host> is the host for an event.
• source::<source>, where <source> is the source for an event.

Configure multivalue fields


Configure multivalue fields

Multivalue fields are fields that can appear multiple times in an event and have a different value for
each appearance. One of the more common examples of multivalue fields is that of email address
fields, which typically appears two to three times in a single sendmail event--once for the sender,
another time for the list of recipients, and possibly a third time for the list of Cc addresses, if one
exists. If all of these fields are labeled identically (as "AddressList," for example), they lose meaning
that they might otherwise have if they're identified separately as "From", "To", and "Cc".

Splunk parses multivalue fields at search time, and enables you to process the values in the search
pipeline. Search commands that work with multivalue fields include makemv, mvcombine, mvexpand,
and nomv. For more information on these and other commands see the topic on multivalue fields in
the User manual, and the Search Reference manual.

Use the TOKENIZER key to configure multivalue fields in fields.conf. TOKENIZER uses a regular
expression to tell Splunk how to recognize and extract multiple field values for a recurring field in an
event. Edit fields.conf in $SPLUNK_HOME/etc/system/local/, or your own custom
application directory in $SPLUNK_HOME/etc/apps/.

For more information on configuration files in general, see "About configuration files" in the Admin
manual.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test
regexes by using them in searches with the rex search command. Splunk also maintains a list of
useful third-party tools for writing and testing regular expressions.

52
Configure a multivalue field via fields.conf

Define a multivalue field by adding a stanza for it in fields.conf. Then add a line with the
TOKENIZER key and a corresponding regular expression that shows how the field can have multiple
values.

Note: If you have other attributes to set for a multivalue field, set them in the same stanza
underneath the TOKENIZER line. See the fields.conf topic in the Admin manual for more information.

[<field name 1>]


TOKENIZER = <regular expression>

[<field name 2>]


TOKENIZER = <regular expression>

• <regular expression> should indicate how the field in question can take on multiple
values.
• TOKENIZER defaults to empty. When TOKENIZER is empty, the field can only take on a single
value.
• Otherwise the first group is taken from each match to form the set of field values.
• The TOKENIZER key is used by the where, timeline, and stats commands. It also provides the
summary and XML outputs of the asynchronous search API.

Note: Tokenization of indexed fields (fields extracted at index time) is not supported. If you have set
INDEXED=true for a field, you cannot also use the TOKENIZER key for that field. You can use a
search-time extraction defined in props.conf and transforms.conf to break an indexed field
into multiple values.

Example

The following examples from $SPLUNK_HOME/etc/system/README/fields.conf.example


break email fields To, From, and CC into multiple values.

[To]
TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w)

[From]
TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w)

[Cc]
TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w)

53
Data classification: Event types and transactions
About event types
About event types

Event types are a categorization system to help you make sense of your data. Event types let you sift
through huge amounts of data, find similar patterns, and create alerts and reports.

Events versus event types

An event is a single record of activity within a log file. An event typically includes a timestamp and
provides information about what occurred on the system being monitored or logged.

An event type is a user-defined field that simplifies search by letting you categorize events. Event
types let you classify events that have common characteristics. When your search results come back,
they're checked against known event types. An event type is applied to an event at search time if that
event matches the event type definition in eventtypes.conf. Tag or save event types after indexing
your data.

Event type classification

There are several ways to create your own event types. Define event types via Splunk Web or
through configuration files, or you can save any search as an event type. When saving a search as
an event type, you may want to use the punct field to craft your searches. The punct field helps you
narrow down searches based on the structure of the event.

Use the punct field to search on similar events

Because the format of an event is often unique to an event type, Splunk indexes the punctuation
characters of events as a field called punct. The punct field stores the first 30 punctuation
characters in the first line of the event. This field is useful for finding similar events quickly.

When you use punct, keep in mind:

• Quotes and backslashes are escaped.


• Spaces are replaced with an underscore (_).
• Tabs are replaced with a "t".
• Dashes that follow alphanumeric characters are ignored.
• Interesting punctuation characters are:

",;-#$%&+./:=?@\\'|*\n\r\"(){}<>[]^!"

• The punct field is not available for events in the _audit index because those events are signed
using PKI at the time they are generated.

54
For an introduction to the punct field and other methods of event classification, see "Classify and
group similar events" topic in the User manual.

Punct examples

This event:

####<Jun 3, 2005 5:38:22 PM MDT> <Notice> <WebLogicServer> <bea03>


<asiAdminServer> <WrapperStartStopAppMain> <>WLS Kernel<> <> <BEA-000360>
<Server started in RUNNING mode>

Produces this punctuation:

####<_,__::__>_<>_<>_<>_<>_<>_

This event:

172.26.34.223 - - [01/Jul/2005:12:05:27 -0700] "GET


/trade/app?action=logout HTTP/1.1" 200 2953

Produces this punctuation:

..._-_-_[:::_-]_\"_?=_/.\"__

Event type discovery

Pipe any search to the typelearner command and create event types directly from Splunk Web. The
file eventdiscoverer.conf is mostly deprecated, although you can still specify terms to ignore when
learning new event types in Splunk Web.

Create new event types

The simplest way to create a new event type is through Splunk Web. Save an event type much in the
same way you save a search. For more information, see "Define and maintain event types in Splunk
Web" in this manual.

Create new event types by modifying eventtypes.conf. For more about saving searches as event
types, see the "Classify and group similar events" topic in the User manual.

Event type tags

Tag event types to organize your data into categories. There can be multiple tags per event. For
more information about event type tagging, see the "Tag event types" topic in this manual

55
Configuration files for event types

Event types are stored in eventtypes.conf.

Terms for event type discovery are set in eventdiscoverer.conf.

Define and maintain event types in Splunk Web


Define and maintain event types in Splunk Web

Any search that does not involve a pipe operator or a subsearch can be saved as an event type. A
single event can match multiple event types.

Any event types you create through Splunk Web are automatically added to eventtypes.conf in
$SPLUNK_HOME/etc/users/<your-username>/<app>/local/, where <app> is the app you
were in when you created the event type. If you change the permissions on the event type to make it
available to all users (either in the app, or globally to all apps), Splunk moves the event type to
$SPLUNK_HOME/etc/apps/<App>/local/.

Save a search as an event type

To save a search as an event type:

• Enter the search and run it.


• Select the Actions... dropdown and click Save as event type...

The Save Event Type dialog box pops up, pre-populated with your search terms.

• Name the event type.


• Optionally add one or more tags for the event type, comma-separated.
• Click Save.

You can now use your event type in searches. If you named your event type foo, you'd use it in a
search like this:

eventtype=foo
Automatically find and build event types

Unsure whether you have any interesting event types in your IT data? Splunk provides utilities that
dynamically and intelligently locate and create useful event types:

• Find event types: The findtypes search command analyzes a given set of events and
identifies common patterns that could be turned into potentially useful event types.
• Build event types: The Build Event Type utility enables you to dynamically create event
types based on events returned by searches.

56
Find event types

To use the event type finder, add this to the end of your search:

...| findtypes
Searches that use the findtypes command return a breakdown of the most common groups of
events found in the search results. They are:

• hierarchically ordered in terms of "coverage" (frequency). This helps you easily identify kinds
of events that are subsets of larger event groupings.
• coupled with searches that can be used as the basis for event types that will help you locate
similar events.

By default, findtypes returns the top 10 potential event types found in the sample, in terms of the
number of events that match each kind of event discovered. You can increase this number by adding
a max argument: findtypes max=30

Splunk also indicates whether or not the event groupings discovered with findtypes have already
been associated with other event types.

Note: The findtypes command analyzes 5000 events at most to return these results. You can
lower this number using the head command for a more efficient search:

...| head 1000 | findtypes


Test potential searches before saving them as event types

When you identify a potentially useful event grouping, test the search associated with it to see if it
returns the results you want. Click Test for the event grouping in which you are interested in to see its
associated search run in a separate window. After the search runs, review the results it returns to
determine whether or not it is capturing the specific information you want.

57
Save a tested search as an event type

When you find a search that returns the right collection of results, save it as an event type by clicking
Save for the event grouping with which it is associated. The Save Event Type dialog appears. Enter a
name for the event type, and optionally identify one or more tags that should be associated with it,
separated by commas. You can also edit the search if necessary.

Build event types

If you find an event in your search results that you'd like to base an event type on, open the
dropdown event menu (find the down arrow next to the event timestamp) and click Build event type.
Splunk takes you to the Build Event Type utility. You can use this utility to design a search that
returns a select set of events, and then create an event type based on that search.

The Build Event Type utility finds a set of sample events that are similar to the one you selected from
your search results. In the Event type features sidebar, you'll find possible field/value pairings that
you can use to narrow down the event type search further.

The Build Event Type utility also displays a search string under Generated event type at the top of
the page. This is the search that the event type you're building will be based upon. As you select
other field/value pairs in the Event type features sidebar, the Generated event type updates to
include those selections. The list of sample events updates as well, to reflect the kinds of events that
the newly modified event type search would return.

If you want to edit the event type search directly, click Edit. This brings up the Edit Event Type dialog,
which you can use to edit the search string.

Test potential searches before saving them as event types

When you build a search that you think might be a useful event type, test it. Click Test to see the
search run in a separate window.

Save a tested search as an event type

If you test a search and it looks like it's returning the correct set of events, you can click Save to save
it as an event type. The Save Event Type dialog appears. Enter a name for the event type, and
optionally identify one or more tags that should be associated with it, separated by commas. You can
also edit the search if necessary.

Add and maintain event types in Manager

The Event Types page in Manager enables you to view and maintain details of the event types that
you have created or which you have permission to edit. You can also add new event types through
the Event Types page. Event types displayed on the Event Types page may be available globally
(system-wide) or they may apply to specific Apps.

Adding an event type in Manager

To add an event type through Manager, navigate to the Event Types page and click New. Splunk
takes you to the Add New event types page.

58
From this page you enter the new event type's Destination App, Name, and the Search string that
ultimately defines the event type (see "Save a search as an event", above).

Note: All event types are initially created for a specific App. To make a particular event type available
to all users on a global basis, you have to locate the event type on the Event Types page, click its
Permissions link, and change the This app only selection to All apps.

You can optionally include Tags for the event type. For more information about tagging event types
and other kinds of Splunk knowledge, see "About tags and aliases" in this manual.

You can also optionally select a Priority for the event type, where 1 is the highest priority and 10 is
the lowest. The Priority setting is important for common situations where you have events that fit two
or more event types. When the event turns up in search results, Splunk displays the event types
associated with the event in a specific order. You use the Priority setting to ensure that certain event
types take precedence over others in this display order.

If you have a number of overlapping event types, or event types that are subsets of larger ones, you
may want to give the precisely focused event types a higher priority. For example, you could easily
have a set of events that are part of a wide-ranging system_error event type. Within that large set
of events, you could have events that also belong to more precisely focused event types like
critical_disc_error and bad_external_resource_error.

In a situation like this, you could give the system_error event type a Priority of 10, while giving the
other two error codes Priority values in the 1 to 5 range. This way, when events that match both
system_error and critical_disc_error appear in search results, the
critical_disc_error event type is always listed ahead of the system_error event type.

Maintaining event types in Manager

To update the details of an event type, locate it in the list on the Event Types page in Manager, and
click its name. Splunk takes you to the details page for the event type, where you can edit the Search
string, Tags, and Priority for the event type, if you have the permissions to do so. You can also
update permissions for event types and delete event types through the Event Types page, if you have
edit permissions for them.

59
Configure event types directly in eventtypes.conf
Configure event types directly in eventtypes.conf

You can add new event types and update existing event types by configuring eventtypes.conf. There
are a few default event types defined in
$SPLUNK_HOME/etc/system/default/eventtypes.conf. Any event types you create through
Splunk Web are automatically added to
$SPLUNK_HOME/etc/system/local/eventtypes.conf.

Configuration

Make changes to event types in eventtypes.conf. Use


$SPLUNK_HOME/etc/system/README/eventtypes.conf.example as an example, or create
your own eventtypes.conf.

Edit eventtypes.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application


directory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see
"About configuration files" in the Admin manual.

[$EVENTTYPE]

• Header for the event type


• $EVENTTYPE is the name of your event type.
♦ You can have any number of event types, each represented by a stanza and any
number of the following attribute/value pairs.

Note: If the name of the event type includes field names surrounded by the percent character (e.g.
%$FIELD%) then the value of $FIELD is substituted at search time into the event type name for that
event. For example, an event type with the header [cisco-%code%] that has code=432 becomes
labeled [cisco-432].

disabled = <1 or 0>

• Toggle event type on or off.


• Set to 1 to disable.

search = <string>

• Search terms for this event type.


• For example: error OR warn.

tags = <string>

• Space separated words that are used to tag an event type.

description = <string>

60
• Optional human-readable description of the event type.

priority = <integer>

• Splunk uses this value to determine the order in which it displays matching event types for an
event. 1 is the highest, and 10 is the lowest.

Note: You can tag eventtype field values the same way you tag any other field/value combination.
See the tags.conf spec file for more information.

Example

Here are two event types; one is called web, and the other is called fatal.

[web]
search = html OR http OR https OR css OR htm OR html OR shtml OR xls OR cgi

[fatal]
search = FATAL

Disable event types

Disable an event type by adding disabled = 1 to the event type stanza eventtypes.conf:

[$EVENTTYPE]
disabled = 1

$EVENTTYPE is the name of the event type you wish to disable.

So if you want to disable the web event type, add the following entry to its stanza:

[web]
disabled = 1

Configure event type templates


Configure event type templates

Event type templates create event types at search time. Define event type templates in
eventtypes.conf. Edit eventtypes.conf in $SPLUNK_HOME/etc/system/local/, or your own
custom application directory in $SPLUNK_HOME/etc/apps/.

For more information on configuration files in general, see "About configuration files" in the Admin
manual.

61
Event type template configuration

Event type templates use a field name surrounded by percent characters to create event types at
search time where the %$FIELD% value is substituted into the name of the event type.

[$NAME-%$FIELD%]
$SEARCH_QUERY

So if the search query in the template returns an event where %$FIELD%=bar, Splunk creates an
event type titled $NAME-bar for that event.

Example

[cisco-%code%]
search = cisco

If a search on "cisco" returns an event that has code=432, Splunk creates an event type titled
"cisco-432".

About transactions
About transactions

A transaction is any group of conceptually related events that spans time. A transaction type is a
configured transaction, saved as a field in Splunk. Any number of data sources can generate
transactions over multiple log entries.

For example, a customer shopping in an online store could generate a transaction across multiple
sources. Web access events might share a session ID with the event in the application server log; the
application server log might contain the account ID, transaction ID, and product ID; the transaction ID
may live in the message queue with a message ID, and the fulfillment application may log the
message ID along with the shipping status. All of this data represents a single user transaction.

Here are some other examples of transactions:

• Web access events


• Application server events
• Business transactions
• E-mails
• Security violations
• System failures

Transaction search

Transaction search is useful for a single observation of any physical event stretching over multiple
logged events. Use the transaction command to define a transaction or override transaction options
specified in transactiontypes.conf.

To learn more, read "Search for transactions" in this manual.

62
Configure transaction types

You may want to persist the transaction search you've created. Or you might want to create a lasting
transaction type. You can save transactions by editing transactiontypes.conf. Define
transactions by creating a stanza and listing specifications.

To learn more about configuring transaction types, read "Define transactions" in this manual.

When to use stats instead of transactions

Transactions aren't the most efficient method to compute aggregate statistics on transactional data. If
you want to compute aggregate statistics over transactions that are defined by data in a single field,
use the stats command.

For example, if you wanted to compute the statistics of the duration of a transaction defined by the
field session_id:

* | stats min(_time) AS earliest max(_time) AS latest by session_id | eval


duration=latest-earliest | stats min(duration) max(duration) avg(duration)
median(duration) perc95(duration)
Similary, if you wanted to compute the number of hits per clientip in an access log:

sourcetype=access_combined | stats count by clientip | sort -count


Also, if you wanted to compute the number of distinct session (parameterized by cookie) per
clientip in an access log:

sourcetype=access_combined | stats dc(cookie) as sessions by clientip |


sort -sessions
Read the stats command reference for more information about using the search command.

Search for transactions


Search for transactions

Search for transactions using the transaction search command either in Splunk Web or at the CLI.
The transaction command yields groupings of events which can be used in reports. To use
transaction, either call a transaction type (that you configured via transactiontypes.conf), or define
transaction constraints in your search by setting the search options of the transaction command.

Search options

Transactions returned at search time consist of the raw text of each event, the shared event types,
and the field values. Transactions also have additional data that is stored in the fields: duration and
transactiontype.

• duration contains the duration of the transaction (the difference between the timestamps of
the first and last events of the transaction).
• transactiontype is the name of the transaction (as defined in transactiontypes.conf
by the transaction's stanza name).

63
You can add transaction to any search. For best search performance, craft your search and then
pipe it to the transaction command. For more information see the topic on the transaction
command in the Search Reference manual.

Follow the transaction command with the following options. Note: Some transaction options
do not work in conjunction with others.

[field-list]

• This is a comma-separated list of fields, such as ...|transaction host,cookie


• If set, each event must have the same field(s) to be considered part of the same transaction.
• Events with common field names and different values will not be grouped.
♦ For example, if you add ...|transaction host, then a search result that has
host=mylaptop can never be in the same transaction as a search result with
host=myserver.
♦ A search result that has no host value can be in a transaction with a result that has
host=mylaptop.

match=closest

• Specify the matching type to use with a transaction definition.


• The only value supported currently is closest.

maxspan=[<integer> s|m|h|d]

• Set the maximum pause between the events in a transaction.


• Can be in seconds, minutes, hours or days.
♦ For example: 5s, 6m, 12h or 30d.
• Defaults to maxspan=-1, for an "all time" timerange.

maxpause=[<integer> s|m|h|d]

• Specifies the maximum pause between transactions.


• Requires there be no pause between the events within the transaction greater than maxpause.
• If the value is negative, the maxspause constraint is disabled.
• Defaults to maxpause=-1.

startswith=<string>

• A search or eval-filtering expression which, if satisfied by an event, marks the beginning of a


new transaction.
• For example:
♦ startswith="login"
♦ startswith=(username=foobar)
♦ startswith=eval(speed_field < max_speed_field)
♦ startswith=eval(speed_field < max_speed_field/12)
• Defaults to "".

endswith=<transam-filter-string>

64
• A search or eval-filtering expression which, if satisfied by an event, marks the end of a
transaction.
• For example:
♦ endswith="logout"
♦ endswith=(username=foobar)
♦ endswith=eval(speed_field < max_speed_field)
♦ endswith=eval(speed_field < max_speed_field/12)
• Defaults to "".

For startswith and endswith, <transam-filter-string> is defined with the following


syntax: "<search-expression>" | (<quoted-search-expression>) |
eval(<eval-expression>

• <search-expression> is a valid search expression that does not contain quotes.


• <quoted-search-expression> is a valid search expression that contains quotes.
• <eval-expression> is a valid eval expression that evaluates to a boolean.

Examples:

• search expression: (name="foo bar")


• search expression: "user=mildred"
• search expression: ("search literal")
• eval bool expression: eval(distance/time < max_speed)

Transactions and macro search

Transactions and macro searches are a powerful combination that allow substitution into your
transaction searches. Make a transaction search and then save it with $field$ to allow substitution.

For an example of how to use macro searches and transactions, see "Create and use search
macros" in the User manual. For more information about macro searches, see "Design macro
searches" in this manual.

Example transaction search

Run a search that groups together all of the web pages a single user (or client IP address)
looked at over a time range.

This search takes events from the access logs, and creates a transaction from events that share the
same clientip value that occurred within 5 minutes of each other (within a 3 hour time span).

sourcetype=access_combined | transaction clientip maxpause=5m maxspan=3h


Define transactions
Define transactions

Any series of events can be turned into a transaction type. Read more about use cases in "About
transactions", in this manual.

You can create transaction types via transactiontypes.conf. See below for configuration details.

65
For more information on configuration files in general, see "About configuration files" in the Admin
manual.

Configure transaction types in transactiontypes.conf

1. Create a transactiontypes.conf file in $SPLUNK_HOME/etc/system/local/, or your own


custom application directory in $SPLUNK_HOME/etc/apps/.

2. Define transactions by creating a stanza and listing specifications for each transaction within its
stanza. Use the following attributes:

[<transactiontype>]
maxspan = [<integer> s|m|h|d]
maxpause = [<integer> s|m|h|d]
fields = <comma-separated list of fields>
exclusive = <true | false>
match = closest

[<TRANSACTIONTYPE>]

• Create any number of transaction types, each represented by a stanza name and any number
of the following attribute/value pairs.
• Use the stanza name, [<TRANSACTIONTYPE>], to search for the transaction in Splunk Web.
• If you do not specify an entry for each of the following attributes, Splunk uses the default value.

maxspan = [<integer> s|m|h|d]

• Set the maximum time span for the transaction.


• Can be in seconds, minutes, hours or days.
♦ For example: 5s, 6m, 12h or 30d.
• Defaults to 5m.

maxpause = [<integer> s|m|h|d]

• Set the maximum pause between the events in a transaction.


• Can be in seconds, minutes, hours or days.
♦ For example: 5s, 6m, 12h or 30d.
• Defaults to 2s.

fields = <comma-separated list of fields>

• If set, each event must have the same field(s) to be considered part of the same transaction.
• Defaults to "".

exclusive = <true | false>

• Toggle whether events can be in multiple transactions, or 'exclusive' to a single transaction.


• Applies to 'fields' (above).

66
• For example, if fields=url,cookie, and exclusive=false, then an event with a
'cookie', but not a 'url' value could be in multiple transactions that share the same 'cookie', but
have different URLs.
• Setting exclusive = false causes the matcher to look for multiple matches for each event
and approximately doubles the processing time.
• Defaults to "true".

match = closest

• Specify the match type to use.


• Currently, the only value supported is "closest."
• Defaults to "closest."

3. Use the transaction command in Splunk Web to call your defined transaction (by its transaction
type name). You can override configuration specifics during search.

For more information about searching for transactions, see "Search for transactions" in this manual.

67
Data enrichment: Lookups and workflow actions
About lookups and workflow actions
About lookups and workflow actions

Lookups and workflow actions enable you to enrich and extend the usefulness of your event data
through interactions with external resources.

Lookup tables

Lookup tables use information in your events to determine how to add other fields from external data
sources such as static tables (CSV files) and Python-based commands. It's also possible to create
lookups that add fields based on time information.

A really basic example of this functionality would be a static lookup that takes the http_status
value in an event, matches that value with its definition in a CSV file, and then adds that definition to
the event as the value of a new status_description field. So if you have an event where
http_status = 503 the lookup would add status_description = Service Unavailable,
Server Error to that event.

Of course, there are more advanced ways to work with lookups. For example, you can:

• Arrange to have a static lookup table be populated by the results of a saved search.
• Define a field lookup that is based on an external Python script rather than a lookup table. For
example, you could create a lookup that uses a Python script that returns an IP address when
given a host name, and returns a host name when given an IP address.
• Create a time-based lookup, if you are working with a lookup table that includes a field value
that represents time. For example, this could come in handy if you need to use DHCP logs to
identify users on your network based on their IP address and the event timestamp.

For more information, see "Lookup fields from external data sources," in this chapter.

Workflow actions

Workflow actions enable you to set up interactions between specific fields in your data and other
applications or web resources. A really simple workflow action would be one that is associated with a
IP_address field, which, when launched, opens an external WHOIS search in a separate browser
window based on the IP_address value.

You can also set up workflow actions that:

• Apply only to particular fields (as opposed to all fields in an event).


• Apply only to events belonging to a specific event type or group of event types.
• Are accessed either via event dropdown menus, field dropdown menus, or both.
• Perform HTTP GET requests, enabling you to pass information to an external web resource,
such as a search engine or IP lookup service.
• Perform HTTP POST requests that can send field values to an external resource. For
example, you could design one that sends a status value to an external issue-tracking

68
application.
• Take certain field values from a chosen event and insert them into a secondary search that is
populated with those field values and which launches in a secondary browser window.

For information about setting workflow actions up in Manager, see "Create workflow actions in Splunk
Web", in this chapter.

Look up fields from external data sources


Look up fields from external data sources

Use the dynamic fields lookup feature to add fields to your events with information from an external
source, such as a static table (CSV file) or an external (Python) command. You can also add fields
based on matching time information.

For example, if you are monitoring logins with Splunk and have IP addresses and timestamps for
those logins in your Splunk index, you can use a dynamic field lookup to map the IP address and
timestamp to the MAC address and username information for the matching IP and timestamp data
that you have in your DHCP logs.

You can set up a lookup using the Lookups Manager page in Splunk Web or by configuring stanzas
in props.conf and transforms.conf. For more information about using the Lookups Manager,
see the fields lookup tutorial in the User Manual. This topic walks discusses how to use props.conf
and transforms.conf to set up your lookups.

To set up a lookup using the configuration files:

Important: Do not edit conf files in $SPLUNK_HOME/etc/system/default. Instead, you should


edit the file in $SPLUNK_HOME/etc/system/local/ or
$SPLUNK_HOME/etc/apps/<app_name>/local/. If the file doesn't exist, create it.

1. Edit transforms.conf to define your lookup table.

Currently you can define two kinds of lookup tables: static lookups (which utilize CSV files) and
external lookups (which utilize Python scripts). The arguments you use in your transforms stanza
indicate the type of lookup table you want to define. Use filename for static lookups and
external_cmd for external lookups.

Note: A lookup table must have at least two columns. Each column may have multiple instances of
the same value (multi-valued fields).

2. Edit props.conf to apply your lookup table.

This step is the same for both static and external lookups. In this configuration file, you specify the
fields to match and output (or outputnew, if you don't want to overwrite the output field) from the
lookup table that you defined in transforms.conf.

You can have more than one field lookup defined in a single source stanza. Each lookup should have
it's own unique lookup name; for example, if you have multiple tables, you can name them:

69
LOOKUP-table1, LOOKUP-table2, etc., or something more descriptive.

When you add a lookup to props.conf, the lookup is run automatically. If your automatic lookup is very
slow, it will also impact the speed of your searches.

3. Restart Splunk to implement the changes you made to the configuration files.

After restart, you should see the output fields from your lookup table listed in the fields picker. From
there, you can select the fields to display in each of the matching search results.

Set up a fields lookup based on a static file

The simplest fields lookup is based on a static table, specifically a CSV file. The CSV file needs to be
located in one of two places:

• $SPLUNK_HOME/etc/system/lookups/
• $SPLUNK_HOME/etc/apps/<app_name>/lookups/

Create the lookups directory if it does not exist.

1. Edit transforms.conf to define your lookup table.

In transforms.conf, add a stanza to define your lookup table. The name of the stanza is also the
name of your lookup table. You will use this transform in props.conf.

In this stanza, reference the CSV file's name:

[myLookup]
filename = <filename>
max_matches = <integer>

Optionally, you can specify the number of matching entries to apply to an event; max_matches
indicates that the first (in file order) <integer> number of entries are used. By default,
max_matches is 100 for lookups that are not based on a timestamp field.

2. Edit props.conf to apply your lookup table.

In props.conf, add a stanza with the lookup key. This stanza specifies the lookup table that you
defined in transforms.conf and indicates how Splunk should apply it to your events:

[<stanza name>]
lookup-<name> = $TRANSFORM <match_field_in_table> OUTPUT|OUTPUTNEW <output_field_in_table>

• stanza name is the sourcetype, host, or source to which this lookup applies, as specified in
props.conf.
• stanza name can't use regex-type syntax.
• $TRANSFORM references the stanza in transforms.conf where you defined your lookup
table.

70
• match_field_in_table is the column in the lookup table that you use to match values.
• output_field_in_table is the column in the lookup table that you add to your events. Use
OUTPUTNEW if you don't want to overwrite existing values in your output field.
• You can have multiple columns on either side of the lookup. For example, you could have
$TRANSFORM <match_field1>, <match_field2> OUTPUT|OUTPUTNEW
<match_field3>, <match_field4>. You can also have one field return two fields, three
fields return one field, and so on.

Use the AS clause if the field names in the lookup table and your events do not match or if you want
to rename the field in your event:

[<stanza name>]
lookup_<name> = $TRANSFORM <match_field_in_table> AS <match_field_in_event>
OUTPUT|OUTPUTNEW <output_field_in_table> AS <output_field_in_event>

You can have more than one field after the OUTPUT|OUTPUTNEW clause. If you don't use
OUTPUT|OUTPUTNEW, Splunk adds all the field names and values from the lookup table to your
events.

3. Restart Splunk.

Example of static fields lookup

Here's an example of setting up lookups for HTTP status codes in an access_combined log. In this
example, you want to match the status field in your lookup table (http_status.csv) with the field
in your events. Then, you add the status description and status type fields into your events.

The following is the http_status.csv file. You can put this into
$SPLUNK_HOME/etc/apps/<app_name>/lookups/. If you're using this in the Search App, put
the file into $SPLUNK_HOME/etc/apps/search/lookups/:

status,status_description,status_type
100,Continue,Informational
101,Switching Protocols,Informational
200,OK,Successful
201,Created,Successful
202,Accepted,Successful
203,Non-Authoritative Information,Successful
204,No Content,Successful
205,Reset Content,Successful
206,Partial Content,Successful
300,Multiple Choices,Redirection
301,Moved Permanently,Redirection
302,Found,Redirection
303,See Other,Redirection
304,Not Modified,Redirection
305,Use Proxy,Redirection
307,Temporary Redirect,Redirection
400,Bad Request,Client Error
401,Unauthorized,Client Error
402,Payment Required,Client Error
403,Forbidden,Client Error
404,Not Found,Client Error
405,Method Not Allowed,Client Error

71
406,Not Acceptable,Client Error
407,Proxy Authentication Required,Client Error
408,Request Timeout,Client Error
409,Conflict,Client Error
410,Gone,Client Error
411,Length Required,Client Error
412,Precondition Failed,Client Error
413,Request Entity Too Large,Client Error
414,Request-URI Too Long,Client Error
415,Unsupported Media Type,Client Error
416,Requested Range Not Satisfiable,Client Error
417,Expectation Failed,Client Error
500,Internal Server Error,Server Error
501,Not Implemented,Server Error
502,Bad Gateway,Server Error
503,Service Unavailable,Server Error
504,Gateway Timeout,Server Error
505,HTTP Version Not Supported,Server Error

1. In a transforms.conf file located in either $SPLUNK_HOME/etc/system/local/ or


$SPLUNK_HOME/etc/apps/<app_name>/local, put:

[http_status]
filename = http_status.csv

2. In a props.conf file, located in either $SPLUNK_HOME/etc/system/local/ or


$SPLUNK_HOME/etc/apps/<app_name>/local/, put:

[access_combined]
lookup_http = http_status status OUTPUT status_description, status_type

3. Restart Splunk.

Now, when you run a search that returns Web access information, you will see the fields
status_description and status_type listed in your fields picker menu.

Use search results to populate a lookup table

You can edit a local or app-specific copy of savedsearches.conf to use the results of a saved
search to populate a lookup table.

In a saved search stanza, where the search returns a results table:

1. Add the following line to enable the lookup population action.

action.populate_lookup = 1

This tells Splunk to save your results table into a CSV file.

2. Add the following line to tell Splunk where to copy your lookup table.

action.populate_lookup.dest = <string>

72
The action.populate_lookup.dest value is a lookup name from transforms.conf or a
path to a CSV file where Splunk should copy the search results. If it is a path to a CSV file, the
path should be relative to $SPLUNK_HOME.

For example, if you want to save the results to a global lookup table, you might include:

action.populate_lookup.dest = etc/system/lookups/myTable.csv

The destination directory, $SPLUNK_HOME/etc/system/lookups or


$SPLUNK_HOME/etc/<app_name>/lookups, should already exist.

3. Add the following line if you want this search to run when Splunk starts up.

run_on_startup = true

If it does not run on startup, it will run at the next scheduled time. Generally, we recommend that you
set this to true for scheduled searches that populate lookup tables.

Because Splunk copies the results of the saved search to a CSV file, you can set up your fields
lookup the same way you set up a static lookup.

Set up a fields lookup based on an external command or script

For dynamic or external lookups, your transforms.conf stanza references the command or script
and arguments to invoke. This is also called a scripted or external lookup.

You can also specify the type of command or script to invoke:

[myLookup]
external_cmd = <string>
external_type = python
fields_list = <string>
max_matches = <integer>

Use fields_list to list all the fields supported by the external command, delimited by a comma
and space.

Note: Currently, Splunk only supports Python scripts for external lookups. Python scripts used for
these lookups must be located in one of two places:

• $SPLUNK_HOME/etc/apps/<app_name>/bin
• $SPLUNK_HOME/etc/searchscripts

Note: When writing your Python script, if you refer to any external resources (such as a file), the
reference must be relative to the directory where the script is located.

Example of external fields lookup

Here's an example of how you might use external lookups to match with information from a DNS
server. Splunk ships with a script located in $SPLUNK_HOME/etc/system/bin/ called
external_lookup.py, which is a DNS lookup script that:

73
• if given a host, returns the IP address.
• if given an IP address, returns the host name.

1. In a transforms.conf file, put:

[dnsLookup]
external_cmd = external_lookup.py host ip
fields_list = host, ip

2. In a props.conf file, put:

[access_combined]
lookup_dns = dnsLookup host OUTPUT ip AS clientip

The field in the lookup table is named ip, but Splunk automatically extracts the IP addresses from
Web access logs into a field named clientip. So, "OUTPUT ip AS clientip" indicates that you want
Splunk to add the values of ip from the lookup table into the clientip field in the events. Since the
host field has the same name in the lookup table and the events, you don't need to rename the field.

For a reverse DNS lookup, your props.conf stanza would be:

[access_combined]
lookup_rdns = external_lookup.py ip AS clientip OUTPUTNEW host AS hostname

For this example, instead of overwriting the host field value, you want Splunk to return the host
value in a new field, called hostname

3. Restart Splunk.

More about the external lookup script

When designing your external lookup script, keep in mind that it needs to take in a partially empty
CSV file and output a filled-in CSV file. The arguments that you pass to the script are the headers for
these input and output files.

In the DNS lookup example above, the CSV file contains 2 fields, "host" and "ip". The fields that you
pass to this script are the ones you specify in transforms.conf:

external_cmd = external_lookup.py host ip

Note: If you don't pass these arguments, the script will return an error.

When you run the search command:

... | lookup dnsLookup host


You're telling Splunk to use the lookup table that you defined in transforms.conf as [dnsLookup]
and pass into the external command script the values for the "host" field as a CSV file, which may
look like this:

host,ip
work.com

74
home.net

Basically, this is a CSV file with the header "host" and "ip", but missing values for ip. The two headers
are included because they are the fields you specified in the fields_list parameter of
transforms.conf.

The script then outputs the following CSV file and returns it to Splunk, which populates the ip field in
your results:

host,ip
work.com,127.0.0.1
home.net,127.0.0.2

Set up a time-based fields lookup

If your static or external lookup table has a field value that represents time, you can use this time field
to set up your fields lookup. For time-based (or temporal) lookups, add the following lines to your
lookup stanza in transforms.conf:

time_field = <field_name>
time_format = <string>

If time_field is present, by default max_matches is 1. Also, the first matching entry in descending
order is applied.

Use the time_format key to specify the strptime format of your time_field. By default,
time_format is UTC.

For a match to occur with time-based lookups, you can also specify offsets for the minimum and
maximum amounts of time that an event may be later than a lookup entry. To do this, add the
following lines to your stanza:

max_offset_secs = <integer>
min_offset_secs = <integer>

By default, there is no maximum offset and the minimum offset is 0.

Example of time-based fields lookup

Here's an example of how you might use DHCP logs to identify users on your network based on their
IP address and the timestamp. Let's say the DHCP logs are in a file, dhcp.csv, which contains the
timestamp, IP address, and the user's name and MAC address.

1. In a transforms.conf file, put:

[dhcpLookup]
filename = dhcp.csv
time_field = timestamp
time_format = %d/%m/%y %H:%M:%S

2. In a props.conf file, put:

75
[dhcp]
lookup_table = dhcpLookup ip mac OUTPUT user

3. Restart Splunk.

Troubleshooting lookups - Using identical names in lookup stanzas

Lookup table definitions are indicated with the attribute, LOOKUP-<name>. In general it's best if all of
your lookup stanzas have different names to reduce the chance of things going wrong. When you do
give the same name to two or more lookups you can run into trouble unless you know what you're
trying to do:

• If two or more lookups with the same name share the same stanza (the same host, source, or
sourcetype) the first lookup with that stanza in fields.conf overrides the others. All lookups
with the same host, source, or sourcetype should have different names.
• If you have lookups with different stanzas (different hosts, sources, or sourcetypes) that share
the same name, you can end up with a situation where only one of them seems to work at any
given point in time. You may set this up on purpose, but in most cases it's probably not very
convenient.

For example, if have two lookups that share "table" as their:

[host::machine_name]
LOOKUP-table = logs_per_day host OUTPUTNEW average_logs AS logs_per_day

[sendmail]
LOOKUP-table = location host OUTPUTNEW building AS location

Any events that overlap between these two lookups will only be affected by one of them. In other
words:

• events that match the host will get the host lookup.
• events that match the sourcetype will get the sourcetype lookup.
• events that match both will only get the host lookup.

When you name your lookup LOOKUP-table, you're saying this is the lookup that achieves some
purpose or action described by "table". In this example, these lookups are intended to achieve
different goals--one determines something about logs per day, and the other has something to do
with location. You might instead rename them:

[host::machine_name]
LOOKUP-table = logs_per_day host OUTPUTNEW average_logs AS logs_per_day

[sendmail]
LOOKUP-location = location host OUTPUTNEW building AS location

Now you have two different settings that won't collide.

76
Create workflow actions in Splunk Web
Create workflow actions in Splunk Web

Enable a wide variety of interactions between indexed fields and other web resources with workflow
actions. Workflow actions have a wide variety of applications. For example, you can define workflow
actions that enable you to:

• Perform an external WHOIS lookup based on an IP address found in an event.


• Use the field values in an HTTP error event to create a new entry in an external issue
management system.
• Perform an external search (using Google or a similar web search application) on the value of
a specific field found in an event.
• Launch secondary Splunk searches that use one or more field values from selected events.

In addition, you can define workflow actions that:

• Are targeted to events that contain a specific field or set of fields, or which belong to a
particular event type.
• Appear either in field menus or event menus in search results. You can also set them up to
only appear in the menus of specific fields, or in all field menus in a qualifying event.
• When selected, open either in the current window or in a new one.

Define workflow actions using Splunk Manager

You can set up all of the workflow actions described in the bulleted list at the top of this chapter and
many more using Splunk Manager. To begin, go to the Manager page and click Fields. From there
you can go to the Workflow actions page to review and update existing workflow actions. Or you can
just click Add new for workflow actions to create a new one. Both methods take you to the workflow
action detail page, which is where you define individual workflow actions.

If you're creating a new workflow action, you need to give it a Name and identify its Destination app.

There are three kinds of workflow actions that you can set up:

• GET workflow actions, which create typical HTML links to do things like perform Google
searches on specific values or run domain name queries against external WHOIS databases.
• POST workflow actions, which generate an HTTP POST request to a specified URI. This
action type enables you to do things like create entries in external issue management systems
using a set of relevant field values.
• Search workflow actions, which launch secondary searches that use specific field values
from an event, such as a search that looks for the occurrence of specific combinations of
ipaddress and http_status' field values in your index over a specific time range.

77
Target workflow actions to a narrow grouping of events

When you create workflow actions in Manager, you can optionally target workflow actions to a narrow
grouping of events. You can restrict workflow action scope by field, by event type, or a combination of
the two.

Narrow workflow action scope by field

You can set up workflow actions that only apply to events that have a specified field or set of fields.
For example, if you have a field called http_status, and you would like a workflow action to apply
only to events containing that field, you would declare http_status in the Apply only to the following
fields setting.

If you want to have a workflow action apply only to events that have a set of fields, you can declare a
comma-delimited list of fields in Apply only to the following fields. When more than one field is
listed the workflow action is displayed only if the entire list of fields are present in the event.

For example, say you want a workflow action to only apply to events with ip_client and
ip_server fields. To do this, you would enter ip_client, ip_server in Apply only to the following
fields.

Workflow action field scoping also supports use of the wildcard asterisk. For example, if you declare a
simple field listing of ip_* Splunk applies the resulting workflow action to events with either
ip_client or ip_server as well as a combination of both (as well as any other event with a field
that matches ip_*).

By default the field list is set to *, which means that it matches all fields.

If you need more complex selecting logic, we suggest you use event type scoping instead of field
scoping, or combine event type scoping with field scoping.

Narrow workflow action scope by event type

Event type scoping works exactly the same way as field scoping. You can enter a single event type or
a comma-delimited list of event type into the Apply only to the following event types setting to
create a workflow action that Splunk only applies to events belonging to that event type or set of
event types. You can also use wildcard matching to identify events belonging to a range of event
types.

You can also narrow the scope of workflow actions through a combination of fields and event types.
For example, perhaps you have a field called http_status, but you only want the resulting
workflow action to appear in events containing that field if the http_status is greater than or equal
to 500. To accomplish this you would first set up an event type called errors_in_500_range that
is applied to events matching a search like

http_status >= 500


You would then define a workflow action that has Apply only to the following fields set to
http_status and Apply only to the following event types set to errors_in_500_range.

For more information about event types, see "About event types" in this manual.

78
Control workflow action appearance in field and event menus

When workflow actions are set up correctly, they appear in dropdown menus associated with fields
and events in your search results. For example, you can define a workflow action that sets off a
Google search for values of the topic field in events. (The topic field turns up in webserver events
associated with the access of Splunk documentation topics. It has the name of a particular Splunk
documentation topic as its value.)

Depending on how you define the Google search workflow action in Manager, you can have it appear
in field menus for events containing a topic field:

Alternatively, you can have the workflow action appear in the event menus for those same events:

Or you can choose to have it appear in both the event menu and the field menus for events
containing a topic field.

Note that in the event depicted above, the topic field has a value of LicenseManagement. The
menus for this event display the workflow action Google LicenseManagement. Clicking on this
workflow action sets off a Google search for the term LicenseManagement. This is an example of a
"GET link" workflow action, and it's one of three kinds of workflow actions that you can implement in
Splunk. Read on for instructions on setting up all three.

Set up a GET workflow action

GET link workflow actions drop one or more values into an HTML link. Clicking that link performs an
HTTP GET request in a browser, allowing you to pass information to an external web resource, such
as a search engine or IP lookup service.

79
To define a GET workflow action, go to the detail page and set Action type to link, set Link method
to get. Then you define a Label and URI as appropriate.

Note: Variables passed in GET actions via URIs are automatically URL encoded during transmission.
This means you can include values that have spaces between words or punctuation characters.
However, if you're working with a field that has an HTTP address as its value, and you want to pass
the entire field value as a URI, you should use the $! prefix to keep Splunk from escaping the field
value. See "Use the $! prefix to prevent escape of URL or HTTP form field values" below for more
information.

Here's an example of the setup for a GET link workflow action that sets off a Google search on
values of the topic field in search results:

The Label field enables you to define the text that is displayed in either the field or event workflow
menu. Labels can be static or include the value of relevant fields. For example, if you have a field
called topic in your events and you want its value to be included in the label for a Google workflow
action, you might set the Label value to Google $topic$.

In the above example, if the value for topic in an event is CreatefieldactionsinSplunkWeb


the field action displays as Google CreatefieldactionsinSplunkWeb in the topic field menu.

The URI field enables you to define the location of the external resource that you want to send your
field values to. Similar to the Label setting, when you declare the value of a field, you use the name
of the field enclosed by dollar signs. In the above example, this URI uses the GET method to submit

80
the topic value to Google for a search.

You can choose whether the workflow action displays in the event menu, the field menu(s), or both.
You can also identify whether the link opens in the current window or a new window.

You can also arrange for the workflow action to apply only to a specific set of events. You can
indicate that the workflow action only appears in events that have a particular set of fields or which
belong to a specific event type or set of event types.

Example - Provide an external IP lookup

You have configured configured your Splunk app to extract domain names in web services logs and
specify them as a field named domain. You want to be able to search an external WHOIS database
for more information about the domains that appear.

Here's how you would set up the GET workflow action that helps you with this.

In the Workflow actions details page, set Action type to link and set Link method to get.

You then use the Label and URI fields to identify the field involved. Set a Label value of WHOIS:
$domain$. Set a URI value of http://whois.net/whois/$domain$.

After that, you can determine:

• whether the link shows up in the field menu, the event menu, or both.
• whether the link opens the WHOIS search in the same window or a new one.
• restrictions for the events that display the workflow action link. You can target the workflow
action to events that have specific fields, that belong to specific event types, or some
combination of the two.

Set up a POST workflow action

POST workflow actions are set up in a manner similar to that of GET link actions. Go to the workflow
action detail page and set Action type to link, set Link method to post, and define a Label and URI
as appropriate.

However, POST requests are typically defined by a form element in HTML along with some inputs
that are converted into POST arguments. This means that you have to identify POST arguments to
send to the identified URI.

Note: Variables passed in POST link actions via URIs are automatically HTTP-form encoded during
transmission. This means you can include values that have spaces between words or punctuation
characters. However, if you're working with a field that has an HTTP address as its value, and you
want to pass the entire field value as a URI, you should use the $! prefix to keep Splunk from
escaping the field value. See "Use the $! prefix to prevent escape of URL or HTTP form field values"
below for more information.

These arguments are key and value combinations that will be sent to a web resource that responds
to POST requests. On both the key and value sides of the argument, you can use field names
enclosed in dollar signs to identify the field value from your events that should be sent over to the

81
resource. You can define multiple key/value arguments in one POST workflow action.

Example - Allow an http error to create an entry in an issue tracking application

You've configured your Splunk app to extract HTTP status codes from a web service log as a field
called http_status. Along with the http_status field the events typically contain either a normal
single-line description request, or a multiline python stacktrace originating from the python process
that produced an error.

You want to design a workflow action that only appears for error events where http_status is in
the 500 range. You want the workflow action to send the associated python stacktrace and the HTTP
status code to an external issue management system to generate a new bug report. However, the
issue management system only accepts POST requests to a specific endpoint.

Here's how you might set up the POST workflow action that fits your requirements:

82
Note that the first POST argument sends server error $http_status$ to a title field in the
external issue tracking system. If you select this workflow action for an event with an http_staus of
500, then it opens an issue with the title server error 500 in the issue tracking system.

The second POST argument uses the _raw field to include the multiline python stacktrace in the
description field of the new issue.

Finally, note that the workflow action has been set up so that it only applies to events belonging to the
errors_in_500_range event type. This is an event type that is only applied to events carrying
http_error values in the typical HTTP error range of 500 or greater. Events with HTTP error codes
below 500 do not display the submit error report workflow action in their event or field menus.

Set up a secondary search that is dynamically populated with field values from an event

To set up workflow actions that launch dynamically populated secondary searches, you start by
setting Action type to search on the Workflow actions detail page. This reveals a set of fields that
you use to define the specifics of the search.

In Search string enter a search string that includes one or more placeholders for field values,
bounded by dollar signs. For example, if you're setting up a workflow action that searches on client IP
values that turn up in events, you might simply enter clientip=$clientip$ in that field.

Identify the app that the search runs in. If you want it to run in a view other than the current one,
select that view. And as with all workflow actions, you can determine whether it opens in the current
window or a new one.

Be sure to set a time range for the search (or identify whether it should use the same time range as
the search that created the field listing) using . If left blank it runs over all time by default.

Finally, as with other workflow action types, you can restrict the the search workflow action to events
containing specific sets of fields and/or which belong to particular event types.

Example - Launch a secondary search that finds errors originating from a specific Ruby On Rails controller

Say your company uses a web infrastructure that is built on Ruby on Rails. You've set up an event
type to sort out errors related to Ruby controllers (titled controller_error), but sometimes you
just want to see all the errors related to a particular controller. Here's how you might set up a
workflow action that does this:

1. On the Workflow actions detail page, set up an action with the following Label: See other
errors for controller $controller$ over past 24h.

2. Set Action type to Search.

3. Enter the following Search string: sourcetype=rails controller=$controller$


error=*

4. Set an Earliest time of -24h. Leave Latest time blank.

83
5. Using the Apply only to the following... settings, arrange for the workflow action to only appear in
events that belong to the controller_error event type, and which contain the error and
controller fields.

Those are the basics. You can also determine which app or view the workflow action should run in
(for example, you might have a dedicated view for this information titled ruby_errors) and identify
whether the action works in the current window or opens a new one.

Use special parameters in workflow actions

Splunk provides special parameters for workflow actions that begin with an "@" sign. Two of these
special parameters are for field menus only. They enable you to set up workflow actions that apply to
all fields in the events to which they apply.

• @field_name - Refers to the name of the field being clicked on.


• @field_value - Refers to the value of the field being clicked on.

The other special parameters are:

• @sid - Refers to the sid of the search job that returned the event
• @offset - Refers to the offset of the event in the search job
• @namespace - Refers to the namespace from which the search job was dispatched
• @latest_time - Refers to the latest time the event occurred. It is used to distinguish similar
events from one another. It is not always available for all fields.

Example - Create a workflow action that applies to all fields in an event

You can update the Google search example discussed above (in the GET link workflow action
section) so that it enables a search of the field name and field value for every field in an event to
which it applies. All you need to do is change the title to Google this field and value and
replace the URI of that action with
http://www.google.com/search?q=$@field_name$+$@field_value$.

This results in a workflow action that searches on whichever field/value combination you're viewing a
field menu for. If you're looking at the field menu for topic=WhatisSplunkknowledge and select
the Google this field and value field action. the resulting Google search is topic
WhatisSplunkknowledge.

Remember: Workflow actions using the @field_name and/or @field_value parameters are not
compatible with event-level menus.

Example - Show the source of an event

This workflow action uses the other special parameters to show the source of an event in your raw
search data.

The Action type is link and its Link method is get. Its Title is Show source. The URI is
/app/$@namespace$/show_source?sid=$@sid$&offset=$@offset$&latest_time=$@latest_

84
It's targeted to events that have the following fields: _cd, source, host, index.

Try setting this workflow action up in your app (if it isn't installed already) and see how it works.

Use the $! prefix to prevent escape of URL or HTTP form field values

When you define fields to be used in workflow actions, it is often necessary to escape these fields so
they can be safely passed via HTTP to some other external endpoint. Sometimes this escaping may
be undesirable. In these cases, you can use the $! prefix to prevent Splunk from automatically
escaping the field value. In the case of GET workflow actions, it prevents URL escape. In the case of
POST workflow actions, it prevents HTTP form escape.

Example - Passing an HTTP address to a separate browser window

Say you have a GET workflow action that works with a field named http, which has fully formed
HTTP addresses as values. This workflow action is designed to simply open a new browser window
pointing at the HTTP address value of the http field. This won't work if the new window is opened
with an escaped HTTP address. So you use the $! prefix. Where you might normally set the URI
field to $http$ for this workflow action in Manager, you instead set it to $!http$ to keep the HTTP
address from escaping.

Configure workflow actions through workflow_actions.conf


Configure workflow actions through workflow_actions.conf

This topic coming soon. In the meantime, learn how to set up and administrate workflow actions via
Manager.

85
Data normalization: Tags and aliases
About tags and aliases
About tags and aliases

In your data, you might have groups of events with related field values. To help you search more
efficiently for these particular groups of event data, you can assign tags to their field values. You can
assign one or more tags to any field/value combination (including event type, host, source, or source
type).

You can use tags to:

• Help you track abstract field values, like IP addresses or ID numbers. For example, you
could have an IP address related to your main office with the value 192.168.1.2. Tag that
IPaddress value as mainoffice, and then search on that tag to find events with that IP
address.
• Use one tag to group a set of field values together, so you can search on them with one
simple command. For example, you might find that you have two host names that relate to the
same computer. You could give both of those values the same tag. When you search on that
tag, Splunk returns events involving both host name values.
• Give specific extracted fields multiple tags that reflect different aspects of their identity,
which enable you to perform tag-based searches that help you quickly narrow down the results
you want. To understand how this could work, see the following example.

Example:

Let's say you have an extracted field called IPaddress, which refers to the IP addresses of the data
sources within your company intranet. You can make IPaddress useful by tagging each IP address
based on its functionality or location. You can tag all of your routers' IP addresses as router. You can
also tag each IP address based on its location, for example: SF or Building1. An IP address of a
router located in San Francisco inside Building 1 could have the tags router, SF, and Building1.

To search for all routers in San Francisco that are not in Building1, you'd search for the following:

tag=router tag=SF NOT (tag=Building1)


Define and manage tags
Define and manage tags

Splunk provides a set of methods for tag creation and management. Most users will go with the
simplest method--tagging field/value pairs directly in search results, a method discussed in detail in
"Tag and alias field values," in the User manual.

However, as a knowledge manager, you'll probably be using the Tags pages in Manager to curate
the various collections of tags created by users of your Splunk implementation. This topic explains
how to:

86
• Use the tags pages in Manager to manage tags for your Splunk implementation.
• Create new tags through Manager.
• Disable or delete tags with Manager.

Navigate to the Tags pages by selecting Manager > Tags.

Using the Tags pages in Manager

The Tags pages in Manager provide three views of the tags in your Splunk implementation:

• Tags by field value pair(s), which you access by clicking List by field value pair(s) on the
Tags page.
• List by tag name
• Tags by unique ID, which you access by clicking All tag objects on the tags page.

Each of these pages enables you to manage your tag collection in different ways. They enable you to
quickly get a picture of the associations that have been made between tags and field/value pairs over
time. They also allow you to create and remove these associations.

Managing tag sets associated with specific field value pairs

What if you want to see a list of all of the field/value pairs in your system that have tags associated
with them? Furthermore, what if you want to review and even update the set of tags that are
associated with a specific field/value pairing? Or define a set of tags for a particular field/value pair?

The Tags by field value pair(s) Manager page is the answer to these questions. It enables you to
review and edit the tag sets that have been associated with particular field/value pairs.

You can also use this page to manage the permissions around the ability to manage a particular
field/value combination with tags.

To see the list of tags for a specific field/value pair, locate that pairing and click on it in the
Field::Value column. This takes you to the detail page for the field/value pair.

Here's an example of a set of tags that have been defined for the eventtype=auditd_create
field/value pair:

87
You can add more tags, and delete them as well (if you have the permissions to do so).

When you click New on the Tags by field value pair(s) page, the system enables you to define a set
of tags for a new field/value pair.

When you create or update a tag list for a field/value pairing, keep in mind that you may be creating
new tags, or associating existing tags with a different kind of field/value pair than they were originally
designed to work with. As a knowledge manager you should consider sticking to a carefully designed
and maintained set of tags. This practice aids with data normalization, and can reduce confusion on
the part of your users. (For more information see the "Organize and administrate knowledge objects"
chapter of this manual.)

Note: You may want to verify the existence of a field/value pair that you add to the Tags by
field/value pair(s) page. The system will not prevent you from defining a list of tags for a nonexistent
field/value pair.

Reviewing and updating sets of field value pairs associated with specific tags

What if you want to see a list of all of the tags in your system that have one or more tags associated
with them? Furthermore, what if you want to review and even update the set of field/value pairings
that are associated with a specific tag? Or define a set of field/value pairings for a new tag?

These questions are answered by the List by tag name Manager page. It enables you to review and
edit the sets of field/value pairs that have been associated with specific tags.

This page does not allow you to manage permissions for the set of field/value pairs associated with a
tag, however.

To see the list of field/value pairings for a particular tag, locate the tag in the List by tag name, and
click on the tag name in the Tag column. This takes you to the detail page for the tag.

Here's an example displaying the various field/value pairings that the modify tag has been
associated with.

88
You can add field/value associations, and delete them as well (if you have the permissions to do so).

When you click New on the List by tag name page, the system enables you to define a set of
field/value pairings for a new tag.

When you create or update a set of field/value pairings for a tag, keep in mind that you may be
creating new field/value pairings. You may want to verify the existence of field/value pairs that you
associate with a tag. The system will not prevent you from adding nonexistent field/value
associations.

Be wary of creating new tags. Tags may already exist that serve the purpose you're trying to address.
As a knowledge manager you should consider sticking to a carefully designed and maintained set of
tags. This practice aids with data normalization, and can reduce confusion on the part of your users.
(For more information see the "Organize and administrate knowledge objects" chapter of this
manual.)

Reviewing all unique field/value pair and tag combinations

The Tags by unique ID page breaks out all of the unique tag name and field/value pairings in your
system. Unlike the previous two pages, this page only lets you edit one-to-one relationships between
tags and field/value pairs.

You can search on a particular tag to quickly see all of the field/value pairs with which it's associated,
or vice versa. This page is useful especially if you want to disable or clone a particular tag and
field/value association, or if you want to maintain permissions at that level of granularity.

Disabling and deleting tags

If you have a tag that you no longer want to use, or want to have associated with a particular
field/value pairing, you have the option of either disabling it or removing it. If you have the
permissions to do so, you can:

89
• Remove a tag association for a specific field/value pair in the search results.
• Bulk disable or delete a tag, even if it is associated to multiple field values, via the List by tag
name page.
• Bulk disable or delete the associations between a field/value pair and a set of tags via the
Tags by field value pair(s) page.

For information about deleting tag associations with specific field/value pairs in your search results,
see "Tag and alias field values" in the User manual.

Delete a tag with multiple field/value pair associations

You can use Splunk Manager to completely remove a tag from your system, even if it is associated
with dozens of field/value pairs. This method enables you to get rid of all of these associations in one
step.

Navigate to Manager > Tags > List by tag name. Delete the tag. If you don't see a delete link for the
tag, you don't have permission to delete it. When you delete tags, try to be aware of downstream
dependencies by their removal. For more information, see "Curate Splunk knowledge with Manager"
in this manual.

Note: You can also go into the edit view for a particular tag and delete a field/value pair association
directly.

Disable or delete the associations between a field/value pairing and a set of tags

Use this method to bulk-remove the set of tags that is associated to a field/value pair. This method
enables you to get rid of these associations in a single step. It does not remove the field/value pairing
from your data, however.

Navigate to Manager > Tags > Tags by field value pair(s). Delete the field/value pair. If you don't
see a delete link for the field/value pair, you don't have permission to delete it. When you delete these
associations, try to be aware of downstream dependencies that may be adversely affected by their
removal. For more information, see "Curate Splunk knowledge with Manager" in this manual.

Note: You can also go into the edit view for a particular field value and delete a tag association
directly.

Disable tags

Depending on your permissions to do so, you can also disable tag and field/value associations using
the three Tags pages in Manager. When an association between a tag and a field/value pair is
disabled, it stays in the system but is inactive until it is enabled again.

Create aliases for fields


Create aliases for fields

You can create multiple aliases for a field. The original field is not removed. This process enables you
to search for the original field using any of its aliases.

90
Important: Field aliasing is performed after key/value extraction but before field lookups. Therefore,
you can specify a lookup table based on a field alias. This can be helpful if there are one or more
fields in the lookup table that are identical to fields in your data, but have been named differently. For
more information read "Look up fields from external data sources" in this manual.

You can define aliases for fields that are extracted at index time as well as those that are extracted at
search time.

You add your field aliases to props.conf, which you edit in $SPLUNK_HOME/etc/system/local/,
or your own custom application directory in $SPLUNK_HOME/etc/apps/. (We recommend using the
latter directory if you want to make it easy to transfer your data customizations to other index
servers.)

Note: Splunk's field aliasing functionality does not currently support multivalue fields.

To alias fields:

1. Add the following line to a stanza in props.conf:

FIELDALIAS-<class> = (<orig_field_name> AS <new_field_name>)+

• <orig_field_name> is the original name of the field.


• <new_field_name> is the alias to assign to the field.
• You can include multiple field alias renames in one stanza.

2. Restart Splunk for your changes to take effect.

Example of field alias additions for a lookup

Say you're creating a lookup for an external static table CSV file where the field you've extracted at
search time as "ip" is referred to as "ipaddress." In the props.conf file where you've defined the
extraction, you would add a line that defines "ipaddress" as an alias for "ip," as follows:

[accesslog]
EXTRACT-extract_ip = (?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
FIELDALIAS-extract_ip = ip AS ipaddress

When you set up the lookup in props.conf, you can just use ipaddress where you'd otherwise
have used ip:

[dns]
lookup_ip = dnsLookup ipaddress OUTPUT host

For more information about search time field extraction, see "Add fields at search time" in this
manual.

For more information about field lookups, see "Create field lookups from external data sources" in this
manual.

91
Tag the host field
Tag the host field

Tagging the host field is useful for knowledge capture and sharing, and for crafting more precise
searches. You can tag the host field with one or more words. Use this to group hosts by function or
type, to enable users to easily search for all activity on a group of similar servers. If you've changed
the value of the host field for a given input, you can also tag events that are already in the index with
the new host name to make it easier to search across your data set.

Add a tag to the host field with Splunk Web

To add a tag to a host field/value combination in Splunk Web:

1.Perform a search for data from the host you'd like to tag.

2.In the search results, use the drop-down menu next to the host field value that you'd like to tag and
choose Tag host=<current host value>.

3. The Tag This Field dialog box appears. Enter your tag or tags, separated by commas or spaces,
and click Ok.

Host names vs. tagging the host field

The value of the host field is set when an event is indexed. It can be set by default based on the
Splunk server hostname, set for a given input, or extracted from each event's data. Tagging the host
field with an alternate hostname doesn't change the actual value of the host field, but it lets you
search for the tag you specified instead of having to use the host field value. Each event can have
only one host name, but multiple host tags.

For example, if your Splunk server is receiving compliance data from a specific host, tagging that host
with compliance will help your compliance searches. With host tags, you can create a loose
grouping of data without masking or changing the underlying host name.

You might also want to tag the host field with another host name if you indexed some data from a
particular input source and then decided to change the value of the host field for that input--all the
new data coming in from that input will have the new host field value, but the data that already exists
in your index will have the old value. Tagging the host field for the existing data lets you search for
the new host value without excluding all the existing data.

92
Tag event types
Tag event types

Tag event types to add information to your data. Any event type can have multiple tags. For example,
you can tag all firewall event types as firewall, tag a subset of firewall event types as deny and tag
another subset as allow. Once an event type is tagged, any event type matching the tagged pattern
will also be tagged.

Note: You can tag an event type when you create it in Splunk Web or configure it in eventtypes.conf.

Add tags to event types using Manager

Splunk Manager enables you to view and edit lists of event types.

• Click the Manager link in the upper right-hand corner.


• Select Event types.
• Locate the event type you want to tag and click on its name to go to its detail page.
♦ Note: Keep in mind that event types are often associated with specific Splunk apps.
They also have role-based permissions that can prevent you from seeing and/or editing
them.
• On the detail page for the event type, add or edit tags in the Tags field.
• Click Save to confirm your changes.

Once you have tagged an event type, you can search for it in the search bar with the syntax
tag::<field>=<tagname> or tag=<tagname>:

tag=foo
tag::host=*local*

93
Manage your search knowledge
Manage saved searches
Manage saved searches

Content coming soon

For a basic overview of saving searches and sharing them with others, see "Save searches and
share search results" in the User manual.

This topic will discuss saved searches from a knowledge management perspective, including the use
of the Saved search page in Manager.

Configure the priority of scheduled searches


Configure the priority of scheduled searches

This topic discusses the two options you can use to control the priority of concurrent scheduled
searches with the search scheduler. The options are real-time scheduling and continuous scheduling:

• Real-time scheduling ensures that scheduled searches are always run over the most recent
time range, even when a number of searches are scheduled to run at approximately the same
time and the scheduler can only run one search concurrently. Because of the way it works,
searches with real-time scheduling can end up skipping scheduled runs. However, they are
always given priority over searches with continuous scheduling.
• Continuous scheduling ensures that each scheduled run of a search is eventually
performed, even if the result is that those searches are delayed. These settings are managed
at the saved search level via savedsearches.conf. Splunk gives all scheduled searches
real-time scheduling by default, but when a scheduled search is enabled for summary
indexing, Splunk automatically changes its scheduling option to continuous.

To understand the necessity of these two scheduler options, you need to understand how the search
scheduler handles concurrent searches.

For more information about scheduling saved searches, see "Schedule saved searches" in the User
manual.

How the search scheduler handles concurrent searches

The Splunk search scheduler limits the number of scheduled searches that can be run concurrently.
The default, set by the max_searches_perc setting in limits.conf, sets the maximum number
of concurrent searches that can be handled by the scheduler to 25% of the
max_searches_per_cpu value. By default, max_searches_per_cpu is set to two searches for
every CPU in your system plus two. So if your system only has one CPU, the scheduler can only run
one search at a time (1 = 25% of 4).

94
Note: We strongly recommend that you avoid changing limits.conf settings unless you know
what you are doing.

So, if your scheduler can only run one search at a time, but you have multiple searches scheduled to
run on an hourly basis over the preceding hour's data, what happens? The scheduler lines the
searches up and runs them in consecutive order for the scheduled time period, but each search
returns information for the time frame over which it was scheduled to run.

Example of real-time scheduling versus continuous scheduling

So, given how the scheduler works, how is real-time scheduling different from continuous scheduling,
and under what conditions would you prefer one option over the other?

First, say you have two saved, scheduled searches that for the purpose of simplicity we'll call A and
B:

• Search A runs every minute and takes 30 seconds to complete


• Search B runs every 5 minutes and takes 2 minutes to complete

Let's also say that you have a Splunk configuration that enables the search scheduler to run only one
search at a time.

Both searches are scheduled to run at 1:05pm.

Time Scheduler action


The scheduler runs A for the 1:04 to 1:05 period, and schedules it to run again at
1:05:00pm
1:06pm. It is 1:05:30pm when search A completes.
The scheduler runs search B. Because it takes 2 minutes to run, search B won't
1:05:30pm
complete until 1:07:30.
The scheduler wakes up and attempts to run search A, but it cannot run because
1:06:00pm
search B is still in process.
The scheduler continues to attempt to run search A until 1:06:59. At this point what
1:06:59pm happens next depends on whether search A is using real-time or continuous
scheduling (see below).
If search A is configured to have:

• real-time scheduling, the scheduler skips the 1:05-1:06 run of the search and schedules the
next run of search A for 1:07:00pm (for the 1:06 to 1:07 period). The new search run time is
based on the current scheduled run time (1:06:00pm).
• continuous scheduling, the scheduler does not advance the schedule and attempts to run
the search for the 1:05 to 1:06pm period indefinitely, and whatever the eventual search run
time is, the next time period that search A would cover would be 1:06 to 1:07pm.

Real-time scheduling is the default for all scheduled searches. It's designed to ensure that the search
returns current data. It assumes there won't be any problems if some scheduled searches are

95
skipped, as long as it returns up-to-the minute results in the most recent run of the search.

Continuous scheduling is used for situations where problems arise when there's any gap in the
collection of search data. In general this is only important for searches that populate summary
indexes, though you may find other uses for it. When a search is enabled for summary indexing,
Splunk changes its scheduling option to continuous automatically.

Note: For more information about summary index searches, see "Use summary indexing for
increased reporting efficiency" in the Knowledge Manager manual.

Configure the realtime_schedule option

The system uses the realtime_schedule option in savedsearches.conf to determine the next
run time of a scheduled search. This is set individually for each saved and scheduled search.

realtime_schedule= 0 | 1

• Set realtime_schedule to 1 to use real-time scheduling. With this setting the scheduler
makes sure that it is always running the search over the most recent time range. Because
searches can't always run concurrently with others, this means that it may skip some search
periods. This is the default value for a scheduled search.
• Set realtime_schedule to 0 to use continuous scheduling. This setting ensures that
scheduled search periods are never skipped. Splunk automatically sets this value to 0 for any
scheduled search that is enabled for summary indexing.

The scheduler is designed to give searches with real-time scheduling priority over those with
continuous scheduling; it always tries to run the real-time searches first.

Design macro searches


Design macro searches

To simplify managing your searches, you can create saved searches that include macros, which are
parametrized chunks of a search. These search macros can by any part of a search, such as an eval
statement or search term, and do not need to be a complete command. With macros, you can reuse
chunks of a search in multiple places, whether its a saved search or an ad hoc search. You can also
specify whether or not the search macros take any or no arguments.

Note: Form searches also use search macros, but include a graphical user interface component.

Configure and manage search macros

You can view, edit, and create search macros using Splunk Web's Manager > Advanced Search >
Search macros page and macros.conf. For more information, see "Create and use search
macros" in the User Manual and the macros.conf reference in the Admin Manual.

Design form searches

96
Design form searches

Form searches are simplified search interfaces that help guide users in the creation of specific kinds
of searches. They can include things like:

• Open fields that take specific field values (such as user names or ID numbers) and can also
display default values.
• Dropdown lists containing dynamically defined collections of search terms.
• Radio buttons that force the choice of particular field values (such as error codes like "404,"
"500," or "503").
• Multiple result panels that take the values received from one form and plug them into various
hidden searches that in turn generate different kinds of charts and reports.

Form searches are created with XML code similar to that used for the construction of dashboards in
Splunk. For more information, see the "Forms: an introduction" chapter of the Developer manual.

Define navigation to saved searches and reports


Define navigation to saved searches and reports

As a knowledge manager you should ensure that your saved searches and reports appear in the
top-level navigation menus of your Splunk apps in a logical manner that facilitates ease of discovery.
To do this you need to customize the navigation menus for your apps. If you fail to attend to your
navigation menus, over time they may become overlong, and inefficient, as saved searches and
reports are added without subsequent categorization.

To manage the way your searches are saved and organized in the top-level navigation menu for an
app, you need to work with the code behind the nav menu. When you do this, keep in mind that the
nav code refers to lists of searches and reports as collections.

The following subtopics describe various things you can do to organize your saved search and
reports listings in the top-level navigation menu. For details on how to adjust the XML code for the
navigation menu, see "Build navigation for your app" in the Developer manual.

Set up a default collection

Each app should have a default collection set up for "unclassified" searches. Unclassified searches
are any searches that haven't been explicitly identified in the nav menu code. This is the collection in
which all newly saved searches appear. In the Search app, for example, the default collection is
Searches & Reports.

If you do not set up a default collection, you will have to manually add saved searches to the nav
code to see them in your app's top-level navigation menu.

Note: A default collection should also be set up for unclassified views and dashboards.

97
Organize saved searches in nested collections

As the number of saved searches and reports that are created for an app grows, you're going to want
to find ways to organize those searches in a logical manner. You can manually construct collections
that group lists together by function. Going further, you can set up nested collections that subdivide
large collections into groups of smaller ones.

In the Search app, nested collections are used to group similar types of searches together:

Dynamically group together saved searches

Collections can be set up to dynamically group together saved searches that have matching
substrings in their names. For example, in the Search app example above, a nested collection groups
together all uncategorized searches with the string "admin" in their titles.

There are two ways that saved searches can be dynamically grouped together with matching
substrings:

• As a collection of uncategorized substring-matching searches, which means that the collection


only displays searches that haven't been manually added to another collection.
• As a collection of all substring-matching searches, which means that the collection displays all
searches with the matching substring whether or not they appear elsewhere in the navigation
menu.

Note: In both cases, only saved searches and reports that are available to the app with which the
navigation menu is associated are displayed.

98
Set up and use summary indexes
Use summary indexing for increased reporting efficiency
Use summary indexing for increased reporting efficiency

Splunk is capable of generating reports of massive amounts of data (100 million events and
counting). However, the amount of time it takes to compute such reports is directly proportional to the
numbers of events summarized. Plainly put, it can take a lot of time to search through very large data
sets. If you only have to do this on an occasional basis, it may not be an issue. But running such
reports on a regular schedule can be impractical--and this impracticality only increases exponentially
as more and more users in your organization use Splunk to run similar reports.

Use summary indexing to efficiently report on large volumes of data. With summary indexing, you
set up a search that extracts the precise information you want, on a frequent basis. Each time Splunk
runs this search it saves the results into a summary index that you designate. You can then run
searches and reports on this significantly smaller (and thus seemingly "faster") summary index. And
what's more, these reports will be statistically accurate because of the frequency of the
index-populating search (for example, if you want to manually run searches that cover the past seven
days, you might run them on a summary index that is updated on an hourly basis).

Summary indexing allows the cost of a computationally expensive report to be spread over time. In
the example we've been discussing, the hourly search to populate the summary index with the
previous hour's worth of data would take a fraction of a minute. Generating the complete report
without the benefit of summary indexing would take approximately 168 (7 days * 24 hrs/day) times
longer.

Perhaps an even more important advantage of summary indexing is its ability to amortize costs over
different reports, as well as for the same report over a different but overlapping time range. The same
summary data generated on a Tuesday can be used for a report of the previous 7 days done on the
Wednesday, Thursday, or the following Monday. It could also be used for a monthly report that
needed the average response size per day.

Summary indexing use cases

Example #1 - Run reports over long time ranges for large datasets more efficiently: Imagine
you're using Splunk at a company that indexes tens of millions of events per day. You want to set up
a dashboard for your employees that, among other things, displays a report that shows the number of
page views and visitors each of your Web sites had over the past 30 days, broken out by site.

You could run this report on your primary data volume, but its runtime would be quite long, because
Splunk has to sort through a huge number of events that are totally unrelated to web traffic in order to
extact the desired data. But that's not all--the fact that the report is included in a popular dashboard
means it'll be run frequently, and this could significantly extend its average runtime, leading to a lot of
frustrated users.

99
But if you use summary indexing, you can set up a saved search that collects website page view and
visitor information into a designated summary index on a weekly, daily, or even hourly basis. You'll
then run your month-end report on this smaller summary index, and the report should complete far
faster than it would otherwise because it is searching on a smaller and better-focused dataset.

Example #2 - Building rolling reports: Say you want to run a report that shows a running count of
an aggregated statistic over a long period of time--a running count of downloads of a file from a Web
site you manage, for example.

First, schedule a saved search to return the total number of downloads over a specified slice of time.
Then, use summary indexing to have Splunk save the results of that search into a summary index.
You can then run a report any time you want on the data in the summary index to obtain the latest
count of the total number of downloads.

For another view, you can watch this Splunk developer video about the theory and practice of
summary indexing.

Using the summary indexing reporting commands

If you are new to summary indexing, use the summary indexing reporting commands (sichart,
sitimechart, sistats, sitop, and sirare) when you define the search that will populate the summary
index. If you use these commands you can use the same search string that you use for the search
that you eventually run on the summary index, with the exception that you use regular reporting
commands in the latter search.

Note: You do not have to use the si- summary index search commands if you are proficient with the
"old-school" way of creating summary-index-populating searches. If you create summary indexes
using those methods and they work for you there's no need to update them. In fact, they may be
more efficient: there are performance impacts related to the use of the si- commands, because they
create slightly larger indexes than the "manual" method does.

In most cases the impact is insignificant, but you may notice a difference if the summary indexes you
are creating are themselves fairly large. You may also notice performance issues if you're setting up
several searches to report against an index populated by an si- command search.

Defining index-populating searches without the special commands

In previous versions of Splunk you had to be very careful about how you designed the searches that
you used to populate your summary index, especially if the search you wanted to run on the finished
summary index involved aggregate statistics, because it meant that you had to carefully set up the
"index-populating" search in a way that did not provide incorrect results. For example, if you wanted
to run a search on the finished summary index that gave you average response times broken out by
server, you'd want to set up a summary-index-populating search that:

• is scheduled to run on a more frequent basis than the search you plan to run against the
summary index
• samples a larger amount of data than the search you plan to run against the summary index.

100
• contains additional search commands that ensure that the index-populating search is
generating a weighted average.

The summary index reporting commands take care of the last two points for you--they automatically
determine the adjustments that need to be made so that your summary index is populated with data
that does not produce statistically inaccurate results. However, you still should arrange for the
summary-index-populating search to run on a more frequent basis than the search that you later run
against the summary index.

If you would like more information about setting up summary-index-populating searches that do not
use the special summary index reporting commands, see "Configure summary indexes" in the
Knowledge Management manual.

Summary indexing reporting command usage example

Let's say you've been running the following search, with a time range of the past year:

eventtype=firewall | top src_ip


This search gives you the top source ips for the past year, but it takes forever to run because it scans
across your entire index each time.

What you need to do is create a summary index that is composed of the top source IPs from the
"firewall" event type. You can use the following search to build that summary index. You would
schedule it to run on a daily basis, collecting the top src_ip values for only the previous 24 hours
each time. The results of each daily search are added to an index named "summary":

eventtype=firewall | sitop src_ip


Note: Summary-index-populating searches are statistically more accurate if you schedule them to run
and sample information on a more frequent basis than the searches you plan to run against the
finished summary index. So in this example, because we plan to run searches that cover a timespan
of a year, we set up a summary-index-populating search that samples information on a daily basis.

Important: When you define summary-index-populating searches, do not pipe other search
operators after the main summary indexing reporting command. In other words, don't include
additional | eval commands and the like. Save the extra search operators for the searches you run
against the summary indexes, not the search you use to populate it.

Important: The results from a summary-indexing optimized search are stored in a special format that
cannot be modified before the final transformation is performed. This means that if you populate a
summary index with ... | sistats <args>, the only valid retrieval of the data is:
index=<summary> source=<saved search name> | stats <args>. The search against
the summary index cannot create or modify fields before the | stats <args> command.

Now, let's say you save this search with the name "Summary - firewall top src_ip" (all saved
summary-index-populating searches should have names that identify them as such). After your
summary index is populated with results, search and report against that summary index using a
search that specifies the summary index and the name of the search that you used to populate it. For
example, this is the search you would use to get the top source_ips over the past year:

101
index=summary search_name="summary - firewall top src_ip" |top src_ip
Because this search specifies the search name, it filters out other data that have been placed in the
summary index by other summary indexing searches. This search should run fairly quickly, even if
the time range is a year or more.

Note: If you are running a search against a summary index that queries for events with a specific
sourcetype value, be aware that you need to use orig_sourcetype instead. So instead of
running a search against a summary index like ...|stats timechart avg(ip) by
sourcetype, use ...|stats timechart avg(ip) by orig_sourcetype.

Why do you have to do this? When events are gathered into a summary index, Splunk changes their
sourcetype values to "stash" and moves the original sourcetype values to orig_sourcetype.

Setting up summary index searches in Splunk Web

You can set up summary index searches through the Splunk Web interface. Summary indexing is an
alert option for saved, scheduled searches. Once you determine the search that you want to use to
populate a summary index, follow these steps:

1. Go to the Search details page for the search, either by clicking Save search in the Search or
Report Builder interface, or through the Searches and Reports page in Manager by selecting the
name of a previously saved search or clicking New.

2. Select Schedule this search if the search isn't already scheduled. Schedule the search to run on
an appropriate interval. Remember that searches that populate summary indexes should run on a
fairly frequent basis in order to create statistically accurate final reports. If the search you're running
against the summary index is gathering information for the past week, you should have the summary
search run on an hourly basis, collecting information for each hour. If you're running searches over
the past year's worth of data, you might have the summary index collect data on a daily basis for the
past day.

Note: Be sure to schedule the search so that there are no data gaps and overlaps. For more on this
see the subtopic on this issue, below.

102
3. Under Alert conditions, select a Perform actions value of always.

4. Under Alert actions, select Enable summary indexing.

5. Enter the name of the summary index that the search will be populating. The Summary index is the
default summary index. You may need to create additional summary indexes if you plan to run a
variety of summary index searches. For information about creating new indexes, see "Set up multiple
indexes" in the Admin manual. It's a good idea to create indexes that are dedicated to the collection
of summary data.

Note: If you enter the name of an index that does not exist, Splunk will run the search on the
schedule you've defined, its data will not get saved to a summary index.

6. (Optional) Under Add fields, you can add field/value pairs to the summary index definition. These
key/value pairs will be annotated to each event that gets summary indexed, making it easier to find
them with later searches. For example, you could add the name of the saved search populating the
summary index (report=summary_firewall_top_src_ip) or the name of the index that the search
populates (index=summary), and then search on those terms later.

Note: You can also add field/value pairs to the summary index configuration in
savedsearches.conf. For more information, see "Configure summary indexes" in the Knowledge
Manager manual.

For more information about saving, scheduling, and setting up alerts for searches, see "Save
searches and share search results", "Schedule saved searches", and "Set alert conditions for
scheduled searches", in this manual.

103
Schedule the populating search to avoid data gaps and overlaps

To minimize data gaps and overlaps you should be sure to set appropriate intervals and delays in the
schedules of searches you use to populate summary indices.

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can
occur if:

• the summary-index-populating search takes too long to run and runs past the next scheduled
run time. For example, if you were to schedule the search that populates the summary to run
every 5 minutes when that search typically takes around 7 minutes to run, you would have
problems, because the search won't run again when it's still running a preceding search.
• splunkd goes down.

Overlaps are events in a summary index (from the same search) that share the same timestamp.
Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if
you set the time range of a saved search to be longer than the frequency of the schedule of the
search. In other words, don't arrange for an hourly search to gather data for the past 90 minutes.

Note: If you think you have gaps or overlaps in your summary index data, Splunk provides methods
of detecting them and either backfilling them (in the case of gaps) or deleting the overlapping events.
For more information, see "Manage summary index gaps and overlaps" in the Knowledge Manager
manual.

How summary indexing works

In Splunk Web, summary indexing is an alert option for scheduled saved searches. When you run a
saved search with summary indexing turned on, its search results are temporarily stored in a file
($SPLUNK_HOME/var/spool/splunk/<savedsearch_name>_<random-number>.stash).
From the file, Splunk uses the addinfo command to add general information about the current search
and the fields you specify during configuration to each result. Splunk then indexes the resulting event
data in in the summary index that you've designated for it (index=summary by default).

Note: Use the addinfo command to add fields containing general information about the current
search to the search results going into a summary index. General information added about the search
helps you run reports on results you place in a summary index.

Summary indexing of data without timestamps

To set the time for summary index events, Splunk uses the following information, in this order of
precedence:

1. The _time value of the event being summarized

2. The earliest (or minimum) time of the search

104
3. The current system time (in the case of an "all time" search, where no "earliest" value is specified)

In the majority of cases, your events will have timestamps, so the first method of discerning the
summary index timestamp holds. But if you are summarizing data that doesn't contain an _time field
(such as data from a lookup), the resulting events will have the timestamp of the earliest time of the
search.

For example, if you summarize the lookup "asset_table" every night at midnight, and the asset table
does not contain an _time column, tonight's summary will have an _time value equal to the earliest
time of the search. If I have set the time range of the search to be between -24h and +0s, each
summarized event will have an _time value of now()-86400 (that's the start time of the search
minus 86,400 seconds, or 24 hours). This means that every event without an _time field value that is
found by this summary-index-populating search will be given the exact same _time value: the
search's earliest time.

The best practice for summarizing data without a time stamp is to manually create an _time value as
part of your search. Following on from the example above:

|inputlookup asset_table | eval _time=now()


Manage summary index gaps and overlaps
Manage summary index gaps and overlaps

The accuracy of your summary index searches can be compromised if the summary indexes involved
have gaps or overlaps in their collected data.

Gaps in summary index data can come about for a number of reasons:

• A summary index initially only contains events from the point that you start data
collection: Don't lose sight of the fact that summary indexes won't have data from before the
summary index collection start date--unless you arrange to put it in there yourself with the
backfill script.
• splunkd outages: If splunkd goes down for a significant amount of time, there's a good
chance you'll get gaps in your summary index data, depending on when the searches that
populate the index are scheduled to run.
• Searches that run longer than their scheduled intervals: If the search you're using to
populate the scheduled search runs longer than the interval that you've scheduled it to run on,
then you're likely to end up with gaps because Splunk won't run a scheduled search again
when a preceding search is still running. For example, if you were to schedule the
index-populating search to run every five minutes, you'll have a gap in the index data collection
if the search ever takes more than five minutes to run.

Overlaps are events in a summary index (from the same index-populating search) that share the
same timestamp. Overlapping events skew reports and statistics created from summary indexes.
Overlaps can occur if you set the time range of a saved search to be longer than the scheduled
search interval. In other words, don't arrange for an hourly search to gather data for the past 90
minutes.

105
Note: For general information about creating and maintaining summary indexes, see "Use summary
indexing for increased reporting efficiency" in the Knowledge Manager manual.

Use the backfill script to add other data or fill summary index gaps

The fill_summary_index.py script backfills gaps in summary index collection by running the
saved searches that populate the summary index as they would have been executed at their regularly
scheduled times for a given time range. In other words, even though your new summary index only
started collecting data at the start of this week, if necessary you can use fill_summary_index.py
to fill the summary index with data from the past month.

In addition, when you run fill_summary_index.py you can specify an App and schedule backfill
actions for a list of summary index searches associated with that App, or simply choose to backfill all
saved searches associated with the App.

When you enter the fill_summary_index.py commands through the CLI, you must provide the
backfill time range by indicating an "earliest time" and "latest time" for the backfill operation. You can
indicate the precise times either by using relative time identifiers (such as -3d@d for "3 days ago at
midnight") or by using UTC epoch numbers. The script automatically computes the times during this
range when the summary index search would have been run.

NOTE: To ensure that the fill_summary_index.py script only executes summary index searches
at times that correspond to missing data, you must use -dedup true when you invoke it.

The fill_summary_index.py script requires that you provide necessary authentication


(username and password). If you know the valid Splunk key when you invoke the script, you can pass
it in via the -sk option.

The script is designed to prompt you for any required information that you fail to provide in the
command line, including the names of the summary index searches, the authentication information,
and the time range.

Examples of fill_summary_index.py invocation

If this is your situation:

You need to backfill all of the summary index searches for the splunkdotcom App for the past
month--but you also need to skip any searches that already have data in the summary index:

Then you'd enter this into the CLI:

./splunk cmd python fill_summary_index.py -app splunkdotcom -name "*" -et


-mon@mon -lt @mon -dedup true -auth admin:changeme

If this is your situation:

You need to backfill the my_daily_search summary index search for the past year, running no
more than 8 concurrent searches at any given time (to reduce impact on Splunk performance while

106
the system collects the backfill data). You do not want the script to skip searches that already have
data in the summary index. The my_daily_search summary index search is owned by the "admin"
role.

Then you'd enter this into the CLI:

./splunk cmd python fill_summary_index.py -app search -name


my_daily_search -et -y -lt now -j 8 -owner admin -auth admin:changeme

Note: You need to specify the -owner option for searches that are owned by a specific user or role.

What to do if fill_summary_index.py is interrupted while running

In the app that you are invoking fill_summary_index.py from (default: 'search'), there will be a 'log'
directory. In this directory, there will be an empty temp file named 'fsidx*lock'.

Delete the 'fsidx*lock' file and you will be able to restart fill_summary_index.py.

fill_summary_index.py usage and commands

In the CLI, start by entering:

python fill_summary_index.py

...and add the required and optional fields from the table below.

Note: <boolean> options accept the values 1, t, true, or yes for "true" and 0, f, false, or no for
"false."

Field Value
-et <string> Earliest time (required). Either a UTC time or a relative time string.
-lt <string> Latest time (required). Either a UTC time or a relative time string.
-app <string> The application context to use (defaults to None).
Specify a single saved search name. Can specify multiple times to provide
-name <string> multiple names. Use the wildcard symbol ("*") to specify all enabled,
scheduled saved searches that have a summary index action.
-names <string> Specify a comma seperated list of saved search names.
-namefile Specify a file with a list of saved search names, one per line. Lines beginning
<filename> with a # are considered comments and ignored.
-owner <string> The user context to use (defaults to "None").
Identifies the summary index that the saved search populates. If the index is
-index <string> not provided, the backfill script tries to determine it automatically. If this
attempt at auto index detection fails, the index defaults to "summary".
-auth <string> The authentication string expects either <username> or
<username>:<password>. If only a username is provided, the script

107
requests the password interactively.
-sleep <float> Number of seconds to sleep between each search. Default is 5 seconds.
-j <int> Maximum number of concurrent searches to run (default is 1).
When this option is set to true, the script doesn't run saved searches for a
-dedup
scheduled time if data already exists in the summary index. If this option is
<boolean>
not used, its default is false.
When this option is set to true, the script periodically shows the done
-showprogress
progress for each currently running search that it spawns. If this option is
<boolean>
unused, its default is false
Advanced options: these should not be used in almost all cases
-trigger When this option is set to false, the script runs each search but does not
<boolean> trigger the summary indexing action. If this option is unused its default is true.
-dedupsearch Indicates the search to be used to determine if data corresponding to a
<string> particular saved search at a specific scheduled times is present
-namefield Indicates the field in the summary index data that contains the name of the
<string> saved search that generated that data.
-timefield Indicates the field in the summary index data that contains the scheduled time
<string> of the saved search that generated that data
Use the overlap command to identify summary index gaps and overlaps

To identify gaps and overlaps in your data, run a search against the summary index that uses the
overlap command. This command identifies ranges of time in the index that include gaps or overlaps.
If you suspect that a particular time range might include gaps and/or overlaps, you can identify it in
the search by specifying a start time and end time or a period and a saved search name, following
the | overlap command in the search string.

Use these two commands to define a specific calendar time range:

• StartTime: Time to start searching for missing entries, starttime=


mm/dd/yyyy:hh:mm:ss (for example: 05/20/2008:00:00:00).
• EndTime: Time to stop searching for missing entries, endtime= mm/dd/yyyy:hh:mm:ss
(for example: 05/22/2008:00:00:00).

Or use these two commands to define a period of time and the saved search to search for missing
events with:

• Period: Specify the length of time period to search, period=<integer>[smhd] (for


example: 5m).
• SavedSearchName: Specify the name of the saved search to search for missing events with
savedsearchname=string (NO wildcards).

If you identify a gap, you can run your scheduled saved search over the period of the gap and
summary index the results with the backfill script (see below).

108
If you identify overlapping events, you can manually delete the overlaps from the summary index by
using the search language.

Configure summary indexes


Configure summary indexes

For a general overview of summary indexing and instructions for setting up summary indexing
through Splunk Web, see the topic "Use summary indexing for increased reporting efficiency" in the
Knowledge Manager manual.

You can't manually configure a summary index for a search in savedsearches.conf until the
search is saved, scheduled, and has the Enable summary indexing alert option is selected.

In addition, you need to enter the name of the summary index search that the search will populate.
You do this through the saved search dialog after selecting Enable summary indexing. The
Summary index is the default summary index (the index that Splunk uses if you do not indicate
another one).

If you plan to run a variety of summary index searches you may need to create addtional summary
indexes. For information about creating new indexes, see "Set up multiple indexes" in the Admin
manual. It's a good idea to create indexes that are dedicated to the collection of summary data.

Note: If you enter the name of an index that does not exist, Splunk will run the search on the
schedule you've defined, but its data will not get saved to a summary index.

For more information about saving, scheduling, and setting up alerts for searches, see "Save
searches and share search results," "Schedule saved searches", and "Set alert conditions for
scheduled searches", in the User manual.

Note: When you define the search that you'll use to build your index, most of the time you should use
the summary indexing reporting commands in the search that you use to build your summary index.
These commands are prefixed with "si-": sichart, sitimechart, sistats, sitop, and sirare. The searches
you create with them should be versions of the search that you'll eventually use to query the
completed summary index.

The summary index reporting commands automatically take into account the issues that are covered
in "Considerations for summary index search definition" below, such as scheduling shorter time
ranges for the populating search, and setting the populating search to take a larger sample. You only
have to worry about these issues if the search that you are using to build your index does not include
summary index reporting commands.

If you do not use the summary index reporting commands, you can use the addinfo and collect
search commands to create a search that Splunk saves and schedules, and which populates a
pre-created summary index. For more information about that method, see "Manually populate the
summary index" in this topic.

109
Customize summary indexing for a saved, scheduled search

When you use Splunk Web to enable summary indexing for a saved, scheduled,
summary-index-enabled search, Splunk automatically generates a stanza in
$SPLUNK_HOME/etc/system/local/savedsearches.conf. You can customize summary
indexing for the search by editing this stanza.

If you've used Splunk Web to save and schedule a search, but haven't used Splunk Web to enable
the summary index for the search, you can easily enable summary indexing for the saved search
through savedsearches.conf as long as you have a new index for it to populate. For more
information about manual index configuration, see, see the topic "About managing indexes" in the
Admin manual.

[ <name> ]
action.summary_index = 0 | 1
action.summary_index._name = <index>
action.summary_index.<field> = <value>

• [<name>]: Splunk names the stanza based on the name of the saved and scheduled search
that you enabled for summary indexing.
• action.summary_index = 0 | 1: Set to 1 to enable summary indexing. Set to 0 to
disable summary indexing.
• action.summary_index._name = <index> - This displays the name of the summary
index populated by the search. If you've created a specific summary index for this search,
enter its name in <index>. Defaults to summary, the summary index that is delivered with
Splunk.
• action.summary_index.<field> = <value>: Specify a field/value pair to add to every
event that gets summary indexed by this search. You can define multiple field/value pairs for a
single summary index search.

This field/value pair acts as a "tag" of sorts that makes it easier for you to identify the events that go
into the summary index when you are performing searches amongst the greater population of event
data. This key is optional but we recommend that you never set up a summary index without at least
one field/value pair.

For example, add the name of the saved search that is populating the summary index
(action.summary_index.report = summary_firewall_top_src_ip), or the name of the
index that the search populates (action.summary_index.index = search).

Search commands useful to summary indexing

Summary indexing utilizes of a set of specialized reporting commands which you need to use if you
are manually creating your summary indexes without the help of the Splunk Web interface or the
summary indexing reporting commands.

• addinfo: Summary indexing uses addinfo to to add fields containing general information
about the current search to the search results going into a summary index. Add | addinfo to
any search to see what results will look like if they are indexed into a summary index.
• collect: Summary indexing uses collect to index search results into the summary index. Use
| collect to index any search results into another index (using collect command

110
options).
• overlap: Use overlap to identify gaps and overlaps in a summary index. overlap finds events
of the same query_id in a summary index with overlapping timestamp values or identifies
periods of time where there are missing events.

Manually configure a search to populate a summary index

If you want to configure summary indexing without using the search options dialog in Splunk Web and
the summary indexing reporting commands, you must first configure a summary index just like you
would any other index via indexes.conf. For more information about manual index configuration,
see, see the topic "About managing indexes" in this manual.

Important: You must restart Splunk for changes in indexes.conf to take effect.

1. Run a search that you want to summarize results from in the Splunk Web search bar.

• Be sure to limit the time range of your search. The number of results that your search
generates needs to fit within the maximum search result limits you have set for searching.
• Make sure to choose a time interval that works for your data, such as 10 minutes, 2 hours, or 1
day. (For more information about setting intervals in Splunk Web, see "Scheduling saved
searches" in the User Manual.)

2. Use the addinfo search command. Append | addinfo to the end of your search.

• This command adds information about the search to events that the collect command requires
in order to place them into a summary index.
• You can always add | addinfo to any search to preview what the results of a search will
look like in a summary index.

3. Add the collect search command. Append |collect index=<index_name> addtime=t


marker="info_search_name=\"<summary_search_name>\"" to the end of the search.

• Replace index_name with the name of the summary index


• Replace summary_search_name with a key to find the results of this search in the index.
• A summary_search_name *must* be set if you wish to use the overlap search command on
the generated events.

Note: For the general case we recommend that you use the provided summary_index alert action.
Configuring via addinfo and collect requires some redundant steps that are not needed when you
generate summary index events from scheduled searches. Manual configuration remains necessary
when you backfill a summary index for timeranges which have already transpired.

Considerations for summary index search definition

If for some reason you're going to set up a summary-index-populating search that does not use the
summary indexing reporting commands, you should take a few moments to plan out your approach.
With summary indexing, the egg comes before the chicken. Use the search that you actually want to
report on to help define the search you use to populate the summary index.

111
Many summary searches involve aggregated statistics--for example, a report where you are
searching for the top 10 ip addresses associated with firewall offenses over the past day--when the
main index accrues millions of events per day.

If you populate the summary index with the results of the same search that you run on the summary
index, you'll likely get results that are statistically inaccurate. You should follow these rules when
defining the search that populates your summary index to improve the accuracy of aggregated
statistics generated from summary index searches.

Schedule a shorter time range for the populating search

The search that populates your summary index should be scheduled on a shorter (and therefore
more frequent) interval than that of the search that you eventually run against the index. You should
go for the smallest time range possible. For example, if you need to generate a daily "top" report,
then the report populating the summary index should take its sample on an hourly basis.

Set the populating search to take a larger sample

The search populating the summary index should seek out a significantly larger sample than the
search that you want to run on the summary index. So, for example, if you plan to search the
summary index for the daily top 10 offending ip addresses, you would set up a search to
populate the summary index with the hourly top 100 offending ip addresses.

This approach has two benefits--it ensures a higher amount of statistical accuracy for the top 10
report (due to the larger and more-frequently-taken overall sample) and it gives you a bit of wiggle
room if you decide you'd rather report on the top 20 or 30 offending ips.

The summary indexing reporting commands automatically take a sample that is larger than the
search that you'll run to query the completed summary index, thus creating summary indexes with
event data that is not incorrectly skewed. If you do not use those commands, you can use the head
command to to select a larger sample for the summary-index-populating search than the search that
you run on the summary index. In other words, you would have | head=100 for the hourly summary
index populating search, and | head=10 for the daily search of the completed summary index.

Set up your search to get a weighted average

If your summary-index-populating search involves averages, and you are not using the summary
indexing reporting commands, you need to set that search up to get a weighted average.

For example, say you want to build hourly, daily, or weekly reports of average response times. To do
this, you'd generate the "daily average" by averaging the "hourly averages" together. Unfortunately,
the daily average becomes skewed if there aren't the same number of events in each "hourly
average". You can get the correct "daily average" by using a weighted average function.

The following expression calculates the the daily average response time correctly with a weighted
average by using the stats and eval commands in conjunction with the sum statistical aggregator.
In this example, the eval command creates a daily_average field, which is the result of dividing
the average response time sum by the average response time count.

112
| stats sum(hourly_resp_time_sum) as resp_time_sum,
sum(hourly_resp_time_count) as resp_time_count | eval daily_average=
resp_time_sum/resp_time_count | .....
Schedule the populating search to avoid data gaps and overlaps

Along with the above two rules, to minimize data gaps and overlaps you should also be sure to set
appropriate intervals and delays in the schedules of searches you use to populate summary indexes.

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can
occur if:

• splunkd goes down.


• the scheduled saved search (the one being summary indexed) takes too long to run and runs
past the next scheduled run time. For example, if you were to schedule the search that
populates the summary to run every 5 minutes when that search typically takes around 7
minutes to run, you would have problems, because the search won't run again when it's still
running a preceding search.

Overlaps are events in a summary index (from the same search) that share the same timestamp.
Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if
you set the time range of a saved search to be longer than the frequency of the schedule of the
search, or if you manually run summary indexing using the collect command.

Example of a summary index configuration

This example shows a configuration for a summary index of Web statistics as it might appear in
savedsearches.conf. The keys listed below enable summary indexing for the saved search
"MonthlyWebstatsReport", and append the field Webstatsreport with a value of 2008 to every
event going into the summary index.

#name of the saved search = Apache Method Summary


[Apache Method Summary]
# sets the search to run at each search interval
counttype = always
# enable the search schedule
enableSched = 1
# search interval in cron notation (this means "every 5 minutes")
schedule = */12****
# id of user for saved search
userid = jsmith
# search string for summary index
search = index=apache_raw startminutesago=30 endminutesago=25 | extract auto=false | stats coun
# enable summary indexing
action.summary_index = 1
#name of summary index to which search results are added
action.summary_index._name = summary
# add these keys to each event
action.summary_index.report = "count by method"

113
Other configuration files affected by summary indexing

In addition to the settings you configure in savedsearches.conf, there are also settings for summary
indexing in indexes.conf and alert_actions.conf.

Indexes.conf specifies index configuration for the summary index. Alert_actions.conf controls the alert
actions (including summary indexing) associated with saved searches.

Caution: Do not edit settings in alert_actions.conf without explicit instructions from Splunk
staff.

114

You might also like