You are on page 1of 2

Data Requirements

A. Transaction Summary
Time Period: Latest available 13 quarters
Sort Criteria: CUSTOMER IDENTIFIER, PERIOD, CATEGORY_CD3 IN ASCENDING ORDER
Filter Criteria:
1. All transactions of a 10% random sample of customers active in last 1 year
2. All CATEGORY_CD3 that account for 95% of sales, rest grouped together as 999
# COLLUMN TYPE
1 CUSTOMER IDENTIFIER (CID or BILL_ACCT_SRC_NBR?) LONG INTEGER
2 PERIOD INTEGER
3 QUARTER INTEGER
4 CATEGORY_CD3 INTEGER
5 AVG_UNIT_RETAIL_IN_CATEGORY FLOAT
6 AVG_MARKDOWN_IN_CATEGORY FLOAT
7 AVG_UNIT_RETAIL_IN_CATEGORY_NORM_CATEGORY_3 FLOAT
8 AVG_UNIT_RETAIL_IN_CATEGORY_NORM_CATEGORY_2 FLOAT
9 NUMBER_ITEMS_FROM_CATEGORY INTEGER
10 RETURN_COUNT_CATEGORY INTEGER
11 AVG_MONTH_ALL FLOAT
12 AVG_DOW_ALL FLOAT
13 NUMBER_ITEMS_ALL INTEGER
14 AVG_UNIT_RETAIL_ALL FLOAT
15 AVG_UNIT_RETAIL_ALL_NORM_CATEGORY_3 FLOAT
16 AVG_UNIT_RETAIL_ALL_NORM_CATEGORY_2 FLOAT
17 AVG_MARKDOWN_ALL FLOAT
18 RETURN_COUNT_ALL INTEGER
19 VISIT_COUNT_ALL INTEGER
20 %_TXN_ONLINE_ALL FLOAT

Additional Information:
1. NET_AMOUNT should be calculated as follows:
when TXN_CHANNEL_CD='D' then ((SOLD_PRICE_AMT*QTY)-OFFER_PRICE_AMT)

when TXN_CHANNEL_CD='R' then (SOLD_PRICE_AMT*QTY)

end AS NET_AMOUNT

2. Include ONLY TXN_TYPE_CD == 1 (i.e. sales) for all calculations except when calculating
count of returns
3. Period is an integer starting with 1 with increments of 1 for each quarter
4. Average unit retail is the total NET_AMOUNT divided by total QUANTITY
5. Average unit retail with ‘NORM_*’ suffixes are normalized by the average AUR for that
product category
6. Average Month/DOW is a floating point average of the Month numbers or Day of Week
numbers across all transactions
7. ‘*_IN_CATEGORY’ suffix indicates that this is for items in the product category
8. ‘*_ALL’ suffix indicates that this is for items across all product categories
9. MARKDOWN is (LIST_PRICE - NET_AMOUNT)/LIST_PRICE

File Format:
1. Tab separated ‘csv’ file
2. Broken into 5 separate files, each with all transactions of a 20% random subset of all
selected customers
3. Header row should be present
4. No missing or non-numeric values expected

You might also like