SUBSET functions

Omniscope SUBSET Functions (2.6+)

This page describes the latest functionality available in Omniscope 2.6 b607 and later.  If you are using 2.5 or an earlier version of 2.6, please see the older guide

 

Defining formulae involving evaluated data subsets 

SUBSET functions are unique to Omniscope and are broken down into two sub-classes of functions which work together:

  1. SUBSETINTERSECTIONUNION and INVERSE... are 'clauses' used to define specific subsets by "querying" the data by matching field values, defining a subset of records for use in another function, or for aggregating and inverting other subsets.
     For example: "All records where Country is UK" or "All records with the same Country as the record being evaluated".
  2. SUBSET_MEAN, SUBSET_SUMRANKRECORDCOUNT etc. are used to calculate a statistical function upon a specified field within a given subset defined by a SUBSET 'clause'.
    For example: "Sum of Sales Volume in subset X" where X is a subset as above.

Note:  these replace the previous, now deprecated DATASET_... functions as they are more flexible and efficient.  Other functions other than SUBSET_MEAN/SUM/etc. also use SUBSET 'clauses' such as RANK. 

 

Warning: experimental functions

The new INTERSECTION, UNION and INVERSE functions are classed as experimental because they have not been optimised for performance.  If you find they perform too poorly for your data size, please continue to use the older SUBSET2 etc. functions documented here.  Performance improvements are scheduled for 2.7.

 

Example: Combining SUBSET 'clause' with SUBSET_SUM to roll-up data

If you wanted to find:

The sum of Sales Volume for the current country 

which could be rewritten as:

Sum of Sales Volume in subset: [all records with the same Country as the record being evaluated]

you would use the formula:

SUBSET_SUM([Sales Volume], SUBSET([Country]))


Examples of SUBSET 'clause' constructions:

SUBSET 'clauses' results in a data subset for use in other functions (such as SUBSET_MEAN or RANK). You would not normally use the result of SUBSET 'clause' as the end result of the entire formula, since you cannot display the definition of the subset defined as a data table inside a cell.

SUBSET 'clauses' are used to identify records by one constraint (e.g. Month is "February").  To identify records by multiple constraints, or use more complex logic, combine SUBSET with INTERSECTIONUNION and INVERSE (e.g. All of records in [Month is "February"] and [Client is "Acme Bank"]).

SUBSET()

The entire dataset

SUBSET([Month])

Data subset with the same Month field value as the record being evaluated

SUBSET([Month]"February")

Data subset with the Month field value 'February'

SUBSET([Quantity]5)

Data subset where the field/column Quantity value equals 5
SUBSET([Quantity]5"<")
SUBSET([Quantity]5"<=")
SUBSET([Quantity]5">")
SUBSET([Quantity]5">=")
SUBSET([Quantity]5"=")
SUBSET(
[Quantity]5"<>")
Data subsets where the field/column Quantity value is less than, less than/equal, greater than, greater than/equal, equal, not equal to the value 5 (respectively).  Use of the "operator" inequality options requires Omniscope 2.6+.

SUBSET([Surname], "Mac""starts-with")
SUBSET([Surname]"-""contains")
SUBSET([Surname]"son""ends-with")
SUBSET([EmailBody]"Manhattan""does-not-contain")

Data subsets where the field Surname begins with "Mac" such as "MacDonald", or contains "-" such as a double-barrelled name, etc. (2.8+) 

INTERSECTION(

  SUBSET([Month])

  SUBSET([Client])
)

Data subset with the same Month and Client field values as the record being evaluated

INTERSECTION(

  SUBSET([Month], 

"February"

),

 

  SUBSET([Client]"Acme Bank"

)

)

 

 

Data subset with the Month and Client field values 'February' and 'Acme Bank'

INTERSECTION(

  SUBSET([Month]

)

 

 

  SUBSET([

Client

]

)

 

 

  SUBSET([Country]

)
)

 

Data subset with the same Month, Client and Country field values as the record being evaluated

 

 

INTERSECTION(

  SUBSET([Month], 

 

)

"February"

 

  SUBSET([

Client

], 

Acme Bank

"

"

)

 

 

  SUBSET([Country]"UK"

)
)

 

Data subset with the Month, Client and Country field values 'February', 'Acme Bank' and 'UK'

INTERSECTION(

  SUBSET([Month], 

"February", "="

)

 

 

  SUBSET([

Client

], 

Acme Bank

"

"

"=

"

)

 

 

  SUBSET([Quantity], 5, ">="

)
)

Data subset where the Month and Client field values are 'February' and 'Acme Bank', and the Quantity field values are greater than or equal to 5.

Examples of how you would use SUBSET: functions

SUBSET_... functions (such as SUBSET_MEAN) accept a field for the statistical function and an optional data subset 'clause' to restrict the record set.

SUBSET_MEAN([Price])
or
SUBSET_MEAN([Price], SUBSET())

The mean of the Price field, for all records
SUBSET_SUM([Price], SUBSET([Month]))
The sum of the Price field, for the data subset with the same Month field value as the record being evaluated
SUBSET_MODE([Currency],
   SUBSET(
[Month]"February"))
The most common Currency field value, for the data subset with the Month field value 'February'

Worked example - single field subset

This example shows a table of data containing Traders and their Sales Volume by Country.  We want to find out the total Sales Volume for each Trader.

Country

Trader

Sales Volume

Formula field

UKJohn Swires10,00060,000
FranceJohn Swires15,00060,000
USAJohn Swires35,00060,000
UKAnne Campbell7,00021,000
JapanAnne Campbell14,00021,000

 

The field "Formula field" above is defined with the formula:

SUBSET_SUM([Sales Volume], SUBSET([Trader])) 

Broken down, this means:

SUBSET([Trader])   The records with the same Trader value as the record being evaluated

SUBSET_SUM([Sales Volume], subset)   The sum of Sales Volume for those records

When Omniscope evaluates the first cell (60,000) in "Formula field":

SUBSET([Trader]) evaluates to all records with Trader "John Swires" (i.e. the first 3 records).

SUBSET_SUM([Sales Volume], subset) evaluates to 60,000 (10,000 + 15,000 + 35,000).

Worked example - multi-field subsets

This example involves a table of data containing multiple quotes for multiple parts to multiple clients in multiple currencies. We want to find the lowest Quote in "USD" for each combination of Part and Customer.

Customer

Currency

Part

Quote

  Min USD Quote

Alpha

GBP

widget

8

 

 

12

Beta

USD

gromett

10

 

 

10

Gamma

Yen

nubbin

50

 

 

6

Alpha

USD

widget

12

 

 

12

Gamma

USD

widget

9

 

 

9

Beta

Euro

 gromett

11

 

 

10

AlphaYen

  nubbin

52  6
BetaUSD

  nubbin

5  5
GammaUSD

 gromett

7  7
AlphaUSD

  nubbin

6  6
GammaUSD

  nubbin

6  6

The formula field [Min USD Quote] above can be defined with the following formula:

SUBSET_MIN([Quote], INTERSECTION(SUBSET([Customer]), SUBSET([Currency], "USD"), 

SUBSET([Part]))

Broken down, this formula means:

SUBSET_MIN( [Quote], find the lowest quotation in the column for the following subsets:

INTERSECTION(SUBSET([Customer]), SUBSET([Currency], "USD"), SUBSET([Part])= all combinations of 'Customer' and 'Part' where Currency="USD"
 

RANK function example

The RANK function works much like the Excel RANK function, and allows you to infer the RANK of a value within a set of values.  In Omniscope, the value can be any literal value or the value of a referenced field.  The set of values must be all values within a named field, either for all records, or for a data subset.

Examples for the above Customer/Currency/Part/Quote/Min USD Quote data, where the Quote column has ordered values as follows:

Quote Rank 
12 
10 
10 
11 

 

So, when evaluating the formula for a record where Quote is 8: 

  • RANK(Quote)   ->   5
  • RANK(Quote, Quote)   ->   5   (same meaning as RANK(Quote))
The following is true for all record evaluations, since it uses a literal value:
  • RANK(8, Quote)   ->   5
  • RANK(12, Quote)   ->   1
  • RANK(6, Quote)   ->   7
  • RANK(5, Quote)   ->   11
Customisations
By default, RANK uses descending order, so larger values rank higher.  To reverse this and use ascending order, provide a 3rd argument 'true'.  For example:
  • RANK(8, Quote, true)   ->   7
  • RANK(12, Quote, true)   ->   11
  • RANK(6, Quote, true)   ->   2
  • RANK(5, Quote, true)   ->   1
RANK also allows you to choose whether to count nulls when generating the rank.  By default nulls are ignored.
RANK also allows you to provide a data subset, so you can customise the set of values considered.