Build a strong foundation in SAS data warehousing by understanding data transformation code and policy, data stewardship and management, interconnectivity between SAS and other warehousing products, and print and web reporting
▶Book Description
SAS is used for various functions in the development and maintenance of data warehouses, thanks to its reputation of being able to handle 'big data'.
This book will help you learn the pros and cons of storing data in SAS. As you progress, you'll understand how to document and design extract-transform-load (ETL) protocols for SAS processes. Later, you'll focus on how the use of SAS arrays and macros can help standardize ETL. The book will also help you examine approaches for serving up data using SAS and explore how connecting SAS to other systems can enhance the data warehouse user's experience.
By the end of this data management book, you will have a fundamental understanding of the roles SAS can play in a warehouse environment, and be able to choose wisely when designing your data warehousing processes involving SAS.
▶What You Will Learn
⦁Develop efficient ways to manage data input/output (I/O) in SAS
⦁Create and manage extract, transform, and load (ETL) code in SAS
⦁Standardize ETL through macro variables, macros, and arrays
⦁Identify data warehouse users and ensure their needs are met
⦁Design crosswalk and other variables to serve analyst needs
⦁Maintain data curation files to improve communication and management
⦁Use the output delivery system (ODS) for print and web reporting
⦁Connect other products to SAS to optimize storage and reporting
▶Key Features
⦁Understand how to use SAS macros for standardizing extract, transform, and load (ETL) protocols
⦁Develop and use data curation files for effective warehouse management
⦁Learn how to develop and manage ETL, policies, and print and web reports that meet user needs
▶Who This Book Is For
This book is for data architects, managers leading data projects, and programmers or developers using SAS who want to effectively maintain a data lake, data mart, or data warehouse.
▶What this book covers
⦁ Chapter 1, Using SAS in a Data Mart, Data Lake, or Data Warehouse, explains the origins of SAS, and how data input/output (I/O) are managed in SAS. It also provides context for how SAS products are used today, in modern data warehouses.
⦁ Chapter 2, Reading Big Data into SAS, covers how to read data in different formats into SAS. It also talks about SAS data formats, and packaging data for import and export in SAS.
⦁ Chapter 3, Helpful PROCs for Managing Data, provides an introduction to PROC CONTENTS, PROC SQL, and PROC PRINT, and describes how to deal with SAS formats and labels. It also provides different strategies for viewing data in SAS.
⦁ Chapter 4, Managing ETL in SAS, explains how to prepare an analytic environment, including developing naming conventions, and SAS format and label policies. It also describes the designation of data storage and user groups.
⦁ Chapter 5, Managing Data Reporting in SAS, introduces you to the output delivery system (ODS), and explains how the ODS is used for outputting graphics files from SAS. This chapter also covers how to use PROCs that were developed specifically for the ODS, including PROC TABULATE and PROC SGPLOT.
⦁ Chapter 6, Standardizing Coding Using SAS Arrays, explains how to do array processing in a SAS data warehouse, how to add conditions to arrays, and how to deal with naming conventions in arrays. In SAS, because of I/O limitations, the use of arrays is usually necessary in ETL code.
⦁ Chapter 7, Designing and Developing ETL Code in SAS, goes over how to plan ETL code, using PROC UNIVARIATE and PROC FREQ to study our data and help us plan how to serve up variables. The second part of the chapter focuses on how to develop optimal ETL code based on our plans.
⦁ Chapter 8, Using Macros to Automate ETL in SAS, describes how to convert data step code used in ETL to SAS macro language in order to automate the process. It also covers how to store and call macros, and how to use them to load transformed data.
⦁ Chapter 9, Debugging and Troubleshooting in SAS, covers debugging approaches in SAS. Advice for forming and formatting code is given, and special attention is given to debugging do loop code and macros.
⦁ Chapter 10, Considering the User Needs of SAS Data Warehouses, describes a method by which to classify users, and then apply data stewardship policies that help ensure their needs are met. For analyst users, providing data access, foreign keys, and crosswalk variables is described. For developer users, providing data curation and other support is delineated.
⦁ Chapter 11, Connecting the SAS Data Warehouse to Other Systems, talks about serving SAS to other data systems, which is typically done asynchronously. Next, it describes connecting SAS to other data systems, which is typically done synchronously through an open database connectivity (ODBC) protocol using SAS/ACCESS.
⦁ Chapter 12, Using the ODS for Visualization in SAS, describes the differences with using the ODS and visualization in SAS when done in print compared to on the web. Next, ways to serve SAS data to the web using the SAS Enterprise Guide aided by SAS Viya are explained, and how to visualize SAS data in other programs, such as R and Tableau, is described.