Data Analytics using DB2 Relational Database
Part Three - Challenge #15

Background:

Data Analytics will increase in importance during the next couple of years using technologies such as z/OS Apache Spark. SQL provides analytic functions today where z/OS Apache Spark will provide SQLContext capability for read-only data in memory, dataframes, coupled with machine learning algorithms in the near future. Therefore, your SQL skills will enable you to quickly become skilled with technologies such as z/OS Apache Spark.

Your challenge:

This challenge has neither a right or wrong answer. This challenge requires innovative business insights about data using SQL.

This challenge assumes your ability to use interactive DB2 SQL from Part 2 Challenge 11 and 12. You will not be given detailed "how to" instructions in this challenge.

The data for this challenge is two DB2 tables, a client profile table (CLIENT_INFO) with approximately 6000 clients and a payment table (SPPAYTB) with approximately 1.5 million card transactions from the 6000 clients over a 1 year period.

DB2 table CARDUSR.CLIENT_INFO

Column Names - ( Description )
  • CONT_ID - ( Client contact ID )
  • GENDER - ( Client gender where 0 or 1 only )
  • AGE_YEARS - ( Client age )
  • HIGHEST_EDU - ( Client education where 1 is lowest and 5 is highest )
  • ANNUAL_INVEST - ( Client savings )
  • ANNUAL_INCOME - ( Client income )
  • ACTIVITY_LEVEL - ( Client physical activity where 0 is lowest 5 is highest)
  • CHURN - ( Client retention where 0 is retained and 1 is cancelled card )

DB2 table CARDUSR.SPPAYTB

Column Names - ( Description )
  • ACAUREQ_HDR_CREDTT - ( Transaction data and time )
  • ACAUREQ_AUREQ_ENV_A_ID_ID - ( Card Issuer )
  • ACAUREQ_AUREQ_ENV_M_ID_ID - ( MDM + MC + # )
  • ACAUREQ_AUREQ_ENV_M_CMONNM - ( Common Merchant Name )
  • ACAUREQ_AUREQ_ENV_CPL_PAN - ( Card Primary Account Number)
  • ACAUREQ_AUREQ_ENV_C_CARDBRND - ( Card Brand )
  • ACAUREQ_AUREQ_TX_MRCHNTCTGYCD - ( Merchant Codes, MC )
  • ACAUREQ_AUREQ_TX_DT_TTLAMT - ( Amount of purchase )
  • CONT_ID - ( Client contact ID )
  • MDM_POSTAL_CODE_ID - ( Master Data Management, MDM )
  • AGE - ( Grace period in days to pay with zero interest )
  • AUTHORRESULT_RSPNT - ( Authorization Request/Response )
  • FRAUD_VER_RESULT - ( Transaction Fraud Y/N )

Notes:

  • Merchants Code - no additional information available
  • Master Data Management - no additional information available
  • Authorization Request/Response - all transactions are approved

Given:

  • CC####.SQL(TBLVIEWS) - SQL provided for reviewing table data
  • CC####.SQL(EXAMPLES) - Examples of gathering insights from data
  • CC####.SQL(ANALYSIS) - Initially contains comments
  • CC####.SQL(ANALYSIS) - Modify to include SQL SELECT statement per instructions

CC####.SQL(ANALYSIS) can contain at least 1 and no more than 5 SELECT SQL statements that provide business insights from the client and payment tables. Each SELECT statements may contain subselects, built-in functions, case expressions, search conditions, joins, etc., etc. A SELECT SQL statement is anything that begins with SELECT and ends with semicolom (;), the SQL statement terminator.

Suggested sources of information (hyperlinks) for this challenge:

Your SELECT statement(s) will be judged according to the business insights in the result set. This challenge will help judges select the top place finishers for all those that have successfully completed all the Part 3 challenges. Completing this challenge along with the other Part 3 challenges will most likely earn you at least an Honorable Mention on the Part 3 Wall of Fame.

Once done, then:

  • edit cc#####.p3.output(#15)
  • copy sql(analysis)
  • F3 - save and exit

Next: Challenge #16