You are on page 1of 12

Statistics

S
A
S
t
a
t
i
s
t
i
c
s

Contents
What is statistics
Univariate Analysis
Proc Univariate
Proc Means
Normal Distribution



Statistics

The branch of mathematics which is the study of the methods of collecting
and analyzing data.

Descriptive statistics are used to describe the basic features of the data in
a study. They provide simple summaries about the sample and the
measures.

Inferential statistics is the next step above descriptive statistics. With
inferential statistics, you are trying to reach conclusions that extend
beyond the immediate data alone.




Design Principle
What is a partitioned data file?

A partitioned data file is one split into many smaller files by one or more class
variables, typically transaction month or year.

Example:-

ACB MIS Motor PT Private Car Vehicle - Partitioned by transaction month.

Original File Partitioned Files
MOPT.vehpcar mopt.vehpcar105
mopt.vehpcar106
mopt.vehpcar107
...


Note:-
Each individual file has the same name as the original file but suffixed by an
identifying number.

Design Principle contd.
How is the Identifying number calculated?

The number is a count of the number of months from a set point in history in the
example January 1980 has been used.

For example:-

Transactions for January 1980 would be stored in mopt.vehpcar1

February 1980 mopt.vehpcar1
January 1981 mopt.vehpcar12
January 2006 mopt.vehpcar312

Any transactions with a date prior to January 1980 will be included within the first
partition. i.e. that suffixed with 1.

Calculation used to determine the suffix or month identifier for a given month.
intck(month,01JAN1980D,partition date wanted).

If a value < 1 is returned then the transaction will be found in the first partition. For
the current reporting month this value can be found by running %datecard.




Design Principle
How do I access all of the data?

A view will always be supplied which will recombine the individual files to give a
view of the original non-partitioned file.


Which macros can be used against partitioned datasets?

Most current macros can still be used against the partitioned datasets.

Some have however been modified specifically for partitioned datasets. They have
been created with the same name as the original macro but prefixed with a P


Standard Macros used
%PARTFILE
This is a macro which will take a dataset and split it by the specified factors.

%PQMERGE
This program performs as %QMERGE but works over partitioned datasets. Feed
in a small key file and return all records for those keys.

%PVIEWGEN
This macro generates a view that sits over all the partitioned dataset to give a
view that looks like the original file. This can include a by statement so that the
resulting view interleaves all the partitioned files into the correct order.

%PQUERY
This macro enables a where clause to be applied to a partitioned dataset view.

%PNOBS
This Macro will be based on the %NOBS macro that counts the number of
observations across all the partitions of a datasets.

%DATECARD
This macro will be updated to contain the identifying number for the current
reporting month. This can be used during the incremental builds to identify the
appropriate partition from the input datasets.
Working With Partitioned Datasets
How to build a partitioned a data file

Initially the current files will be split into the monthly files using the %PARTFILE
macro then on an ongoing basis the monthly extracts will be processed individually
and these new files will be added to the directory.

When partitioning datasets the macro allows for either a character or a numeric date
field.

The %PVIEWGEN macro will then be run to create/update the view definition.

Before you generate the view make sure of the following:-

Rename the original dataset : - The view and the original dataset cannot have the
same name. Once the view and the original dataset have been reconciled the
renamed original dataset can be removed.

The view compiles using the same sort order as your original dataset.
If it has a different sort order you will get an error in the log but it will not cause the
processing to fail, so you will get unexpected results.
Working With Partitioned Datasets
How to read a partitioned a data file

A partitioned dataset can be read in exactly the same way as you would read an
existing file.

The views over the split data will resolve when you use them in processing, but will
take longer.

It may be more efficient to use one of the special partition MACROs.

Reading Multiple Views:-

If using sashelp.vmember to extract the listing of all the datasets in the library rather
use sashelp.vview to give a list of all the views in the library.

If combining multiple views you will need to change your code to process each view
separately and then combine them due to memory limitations

Benefits of Data Partitioning
How partitioned data can reduce data volumes

Cycling files are no longer required.

How partitioned data can help with back ups and restores
Each month we can back up the new months data separately without having to
backup the entire file.

With very large datasets such as the ACP & ACC MIS datasets this enables
backups to be taken which are not currently feasible against the full files.

How partitioned data can help improve dataset build times

Processing no longer requires a read through the entire file. With partitioned data
you only need to read in the file containing that months data.
Issues to take into account
When partitioning a file make sure to consider the following aspects:-

Reading through an entire partitioned dataset is likely to take longer than
against a non-partitioned dataset.

All suites that use partitioned datasets will need to be amended.

Take extra care when modifying macros that are also going to be used
against non-partitioned data.

All partitions should be stored in the same area as the original file.

Make sure that the view compiles using the same sort order as your original
dataset.

Once the view and the original dataset have been reconciled remove the
original dataset.

Process each view separately
Questions
?

You might also like