You are on page 1of 10

NORMALIZE

Purpose
Normalize generates multiple output data records from each of its input records. You can
directly specify the number of output records for each input record, or the number of
output records can depend on some calculation.
Parameters
Transform (filename or string, required)
Either the name of the file containing the types and transform functions, or a transform
string.
Reject-threshold (choice, required)
The component's tolerance for reject events.
See AGGREGATE component for details.
Logging (boolean, optional)
Specifies whether or not you want certain component events logged.
See AGGREGATE components for details.
Runtime Behavior of Normalize
Normalize generates a series of output data records for each input data record as follows.
1. Input selection
If input_select is defined, the input records are filtered as follows:
If input_select returns 0 for a particular record, Normalize does not process the record
(that is, the record does not appear on any output port).
If input_select returns NULL for a particular record, Normalize: Writes the record to
the reject port. Write a descriptive error message to the error port. Discards the
information if you do not connect flows to the reject or error ports
If input_select returns anything other than 0 or NULL for a particular record,
Normalize processes the record.
If you have not defined input_select, Normalize processes all records.
2. Number of iterations
Normalize executes the length transform function. The length function returns a
value, n, that specifies the number of times to call the normalize transform function
for each input record.
3. Temporary initialization
If temporary_type is defined, Normalize executes the initialize transform function to
output an initial value to the temporary variable(s).
Defining temporary_type means that you are declaring one or more temporary
variables. Typically, Normalize does not need temporary variables. If you have not
defined temporary_type, Normalize does not call the initialize transform function.

4. Computation
If you defined temporary_type, Normalize calls the normalize transform function n
times with three arguments: the temporary record, the input record, and an index
value increasing from 0 to n-1. Each time it calls the transform function it produces
a new temporary record.
If you have not defined temporary_type, Normalize calls the normalize transform
function n times with only two arguments: the input record and the index value.
Each time it calls the transform function it produces an output record.
5. Finalization
If you have defined temporary_type, Normalize calls the finalize transform function
with the temporary record and the input record as arguments every time it calls the
normalize transform function.
The finalize transform function produces an output record.
If you have not defined temporary_type, Normalize does not call the finalize
transform function; instead, the normalize function produces the output record
directly.
6. Output selection
If you have defined the output_select transform function, it is now called to filter the
output records, as follows:
Normalize ignores records for which output_select returns 0; it writes all
others to the out port.
If you have not defined the output_select transform function, Normalize
writes all output records to the out port.
If any of the transform functions returns NULL, Normalize writes:
The current input record to the reject port (if connected).
The component stops the execution of the graph when the number of reject events
exceeds the result of the following formula:
limit + (ramp * number_of_records_processed_so_far)
A descriptive error message to the error port (if connected).
If you do not connect flows to the reject or error ports, Normalize discards the
information.
About Normalize Transform Functions
What Normalize specifically does is determined by the functions, types and variables you
define in its transform parameter.
There are six permanent functions, as shown in the following table. Of these, only length
and normalize are required. Examples of most of these functions can be found in Simple
Normalize Example with Vectors.
There is also an optional temporary_type, which you can define if you need to use
temporary variables. For an example of this, see Normalize Example with More
Elaborate Transform.

Optional Normalize Transform Functions and Types


There are four optional transform functions and an optional type you can use with
Normalize:
input_select: If you define the input_select transform function, it performs selection of
input records:
out :: input_select(in) =
begin
out :: in.n == 1;
end;

The input_select transform function takes a single argument the input record and returns
a value of 0 (false) to ignore a record or non-0 (true) to accept a record.
Initialize: The initialize transform function initializes temporary storage. This transform
function takes a single argument the input record and returns a single record with type
temporary_type:
temp :: initialize(in) =
begin
temp.count :: 0;
temp.sum :: 0;
end;
Finalize: The finalize transform function performs the last step in a multistage transform:
out :: finalize(temp, in) =
begin
out.key :: in.key;
out.count :: temp.count;
out.average :: temp.sum / temp.count;
end;
The finalize transform function takes the temporary storage record and the input record
as arguments, and produces a record that has the record format of the out port.
Output_select: If you define the output_select transform function, it performs selection
of output records:
out :: output_select(final) =
begin
out :: final.average > 5;
end;
The output_select transform function takes a single argument the record produced by
finalization and returns a value of 0 (false) to ignore a record or non-0 (true) to output a
record.
Temporary_type: If you want Normalize to use temporary storage, define this storage as
a record with a type named temporary_type:
type temporary_type =
record
int count;
int sum;
end;
Normalize Example Without Vectors
Normalize can also be used to split up non-vector records, similarly to the Meta Pivot
component. An advantage Normalize has over Meta Pivot in such situations is that it can
handle nested records, which Meta Pivot cannot.

In this example, we begin with a dataset whose contents are as follows:

The file's record format is as follows:


Record
String(,) product_cd;
String(,) product_name;
Decimal(\n);
End;
We want to use Normalize to split each input record into two output records: one record
that contains the first input field name and value, and another record that contains the
name and value of the second input field. Both records also contain the my_price value.
The output dataset record format looks like this:
Record
string(",") field_name;
string(",") field_value;
decimal("\n") my_price;
End;
As in the previous example, we must define the Normalize component's length and
normalize functions.
i.
The length function is set up as follows:

Each input record has three fields that Normalize must separately split off and
generate output records for, so the normalize function needs to be executed twice
for each record. We thus simply hard-code the value 2 for the length function's
output.
ii.

The normalize function is set up as follows:

The internal index variable, as always, is initialized by Normalize at 0 and is


incremented each time the normalize function is executed until it reaches the value
returned by length, minus 1.
In this example index is used to determine which name/value pair from the current
input record needs to be processed. On the first iteration of normalize (when index
is 0), the product_cd field value is written to the output record's field_value field,
and the string "product_cd" is written to the field_name output field. On the second
iteration, the product_name input field and name are processed.
The my_price value is written to every output record.
The complete graph looks like this:

After the graph is run, the contents of the Output File are as follows:

FILTER BY EXPRESSION
Purpose
Filter by Expression filters data records according to a DML expression.
Parameters
select_expr (expression, required)
Filter for data records.
reject-threshold (choice, required)
The component's tolerance for reject events.
For more information, see "AGGREGATE" component.

Logging (boolean, optional)


Specifies whether you want certain component events logged.
For more information, see "AGGREGATE" components .
Runtime behavior of FILTER BY EXPRESSION
Many components have a select parameter or input_select transform function that filters
records. If you do not need the deselected records, you can usually get better performance
by eliminating Filter by Expression and using the downstream component's select
parameter or input_select transform function.
Filter by Expression:
Reads data records from the in port.
Applies the expression in the select_expr parameter to each record. If the
expression returns:
o Non-0 value: Filter by Expression writes the record to the out port.
o 0: Filter by Expression writes the record to the deselect port. If you do
not connect a flow to the deselect port, Filter by Expression discards
the records.
o NULL: Filter by Expression writes the record to the reject port and a
descriptive error message to the error port.
Filter by Expression stops execution of the graph when the number of reject events
exceeds the result of the following formula:
limit + (ramp * number_of_records_processed_so_far)
For more information, see "AGGREGATE" component.

FUSE
Purpose
Fuse combines multiple input flows into a single output flow by applying a transform
function to corresponding records of each flow.
NOTE: In conjunction with JOIN, Fuse supersedes MATCH SORTED.
NOTE: We recommend keeping Automatic Flow Buffering, the default, turned on for
FUSE. This component reads input from its flows in a specific order. Thus, turning off
Automatic Flow Buffering could cause deadlock..
Parameters
Count (integer, required)
An integer from 1 to 20 that sets the number of all of the following:

Input arguments to the transform functions


in ports
reject ports
error ports
The n in inn gives each in port a unique number. Each inn port has a corresponding
rejectn port and errorn port.
Default is 2.
Transform (filename or string, required)
Either the name of the file containing the name of a transform package or a transform
string. The transform must contain a function named fuse and optionally a function
named select.
reject-threshold (choice, required)
The component's tolerance for reject events.
For more information, see "AGGREGATE" component.
Logging (boolean, optional)
Specifies whether or not you want certain component events logged.
For more information, see "AGGREGATE" component.
Runtime behavior of FUSE
Fuse applies a transform function to corresponding records of each input flow.
The first time the transform function executes, it uses the first record of each flow. The
second time the transform function executes, it uses the second record of each flow, and
so on. Fuse sends the result of the transform function to the out port.
fuse works as follows:

Fuse tries to read from each of its input flows:


o If all of its input flows are finished, fuse exits.
o Otherwise, Fuse reads one record from each still-unfinished input port
and a NULL from each finished input port.
If Fuse reads a record from at least one flow, Fuse uses the records as
arguments to the select function if the select function is present.
o If the select function is not present, Fuse uses the records as arguments
to the fuse function.
o If the select function is present, fuse discards the records if select
returns zero and uses the records as arguments to the fuse function if
select returns non-zero.
Fuse sends to the out port the record returned by the fuse function.
The fuse and select functions take arguments whose number and record formats are
determined by the number and record formats of the input ports. The fuse function returns
a record whose record format is that of the out port, and the select function returns an
integer(4).

If an input record is malformed such that it is impossible to determine the boundaries of


the record, Fuse emits an error message and exits. If an error occurs within either fuse or
select, Fuse sends the input records to their corresponding reject ports and sends an error
message to the corresponding error ports.

You might also like