SAS ETL Studio CaseStudy – Defining a New Job
Using the Process Designer
The Process Designer invokes the New Job wizard to create metadata about a job. That metadata is used to build a process flow diagram for the job.
A job is a metadata object that specifies processes that create output. SAS ETL Studio organizes sources, targets, and transformations into jobs that can be displayed in a process flow diagram.
SAS ETL Studio uses each job to generate and/or retrieve SAS code that reads sources and creates targets on a file system.
The New Job wizard can also be used to create an empty job. After you have an empty job, you can create a process flow diagram by dragging and dropping tables and transformations into the Process Designer window.
The New Job wizard prompts for information that is used to build a template in the Process Designer.
SAS ETL Studio Case Study – Define Jobs
Defining a New Job
At this point, source tables have been defined and the data in those tables can be viewed. Target table definitions have also been specified, but no data exists in any of the target tables. The next task is to define a job, where metadata can be specified that will allow the extraction, transformation and loading of the source table data to the target tables.
- Select Process Designer from the Shortcut bar of the SAS ETL Studio desktop. This activates the New Job wizard.
- Type Populate the OrderFact Table in the Name field.
3. Select OrderFact from the Select Tables list.
Tables can have similar names, make sure to select OrderFact and not Order_Fact. Select
4. Verify that the information is correct, then select
This completes the steps for the New Job wizard, writes the job metadata to the repository and opens the Process Designer window.
In the Process Designer window, there should be an icon for the OrderFact target table, the Loader icon, and an empty icon labeled Place table or transform here.
The Place table or transform here icon is simply a placeholder for the source data you intend to load into the target table. This could be used for a single table, multiple tables, or a transformation.
The Loader icon represents the physical loading of data into the target table. The load process metadata can be accessed via the properties of the Loader icon.
The target table icon should reflect the table that will be loaded when the job is run and is added automatically if you specified a target table in the New Job wizard.
In this demonstration, you want to load the OrderFact table by using an SQL Join to the Orders and Order_Item source tables.
5. Select the Process Library tab in the tree view area of SAS ETL Studio.
6. Expand the Data Transforms folder and locate the SQL Join
7. Drag the SQL Join transformation to the Process Designer window and drop it in the area labeled Place table or transform here.
The Process Designer view will update to show the SQL Join transformation and two new placeholders for the tables to be joined.
If you want to simply load a single source data table into the target data table, then you can drag and drop the table from the Source Tables group into the icon labeled Place table or transform here. If a more complex data transformation is required, you can drag and drop one of the supplied data transformations (from the Process Library tab) into the empty icon.
8. From the Custom tab of the tree view area, locate the Source Tables group element and expand.
9. Drag the table ORDERS onto one of the Place table or transform here boxes.
10. Drag the table ORDER_ITEM onto the other Place table or transform here box.
To add additional tables to an SQL Join, drag and drop the additional tables onto the SQL Join icon.
11. Right-click on the SQL Join icon and select Properties.
12. Select the Mapping tab.
Notice the name of the target table. This is a temporary table that will be used as input for the next step of the process. At this point, this is not the final target table that you will load.
The temporary target table does not have the new columns that you created in the OrderFact table. You can add them here by right-clicking in the background of the target table variable list, selecting New Column and manually entering the column attributes. Because you have already defined those columns, you can simply import that metadata here without having to enter the information manually, again. Importing the column metadata also ensures consistency.
13. Right-click in the background of the Target table variable listing and select Import Columns from the pop-up menu.
14. Expand the repository Foundation, the group Target Tables , and the table OrderFact and move the variables Order_Year, Order_Qtr, Order_Month and Delivery_Time to the Selected Columns list.
Select . These additional variables now show up in the Target table list on the Mapping tab.
You now need to derive mappings for each of the new columns. There are no columns in the source table that can be mapped directly to the new target table columns, but you can derive a value from the columns that do exist in the source table.
15. Right-click Order_Year and select Expression from the pop-up menu.
16. In the Expression Builder window, select Date and Time to display a listing of all date/time functions. Double-click the Year(date) function (or select it and click ).
The Expression Text field is populated with the function syntax and the placeholder for the argument is highlighted.
An expression can be typed directly into the Expression Text window, or it can be built by using the functions, data sources, and buttons within the Expression Builder.
17. To fill the high-lighted place holder (<numValue>) with a valid column, select the Data Sources tab. Expand the ORDERS table and select Order_Date. Select . The Expression Text field updates to: YEAR(ORDERS.Order_Date).
18. Select to return to the SQL Join Properties window. A warning message may appear:
Select to close the message window and update the mappings.
Loading the Target Table
The final step to be performed is the loading of the target tables.
- Open the Process Designer window for the Populate the OrderFact Table job by expanding the Jobs group (on the Custom tab of the tree view). Right-click on the Populate the OrderFact Table job and select View Job (or double-click the job).
- Right-click on the Loader icon and select Properties.
- Select the Load Technique tab and specify a Load Technique of Refresh.
- Activate the Drop Table radio button.
The valid options for Load Technique are as follows:
|Refresh||removes all the rows from the target and copies all the rows from the source into the target.|
|Append||appends the rows from the source to the end of the target|
|Update||inserts any rows that are in the source but not in the target|
The Before Loading options can affect performance when loading data into the target table:
|Drop Table||deletes and re-creates the entire target. Selecting Drop Table minimizes the physical size of the target|
|Truncate Table||marks existing rows as deleted and retains those rows in physical storage. New rows are appended to the existing rows|
|Drop Indexes||specifies that any indexes on the table are to be deleted before loading. The appropriate indexes are then re-created after loading based on target metadata|
For information on additional options, see the online help for the Loader Properties window.
5. Select the Mapping tab. To map the new columns created earlier, right-click in the Target table area and select Quick Map.
Quick Map works only if the columns specified for both the source and target have the same metadata attributes. If this is not the case, 1-1 mappings can be created by selecting the column in the source area, selecting the column in the target area, then right-clicking and selecting New Mapping from the pop-up menu.
6. Select to return to the Process Designer window.
7. To load the target table, submit the job by right-clicking in the background of the Process Designer window, then select Submit from the pop-up menu (a status window displays while processing is occurring).
8. If the Source Code Execution window closes without displaying any error message, the job ran. Select the Log tab at the bottom of the Process Designer to view the log information from this job.
Warning messages from the APPEND procedure (the last step in the code) are expected.
9. Close the Process Designer window by selecting File Ð Close and save the changes if prompted.
10. To view the data, right-click on the OrderFact table in the Target Tables group (on the Custom tab from the tree view). Select View Data from the pop-up menu.
Scroll to the right when viewing the data to make sure that the derived expressions for the new columns resulted in valid data.
11. To close the View Data window, select File Ð Close.