Skip to end of metadata
Go to start of metadata

Introduction

As a transformation can be referenced as an operation step in multiple operations at the same time, it's important to understand the ramifications and complexity behind reusing transformations.

The process involved in referencing a transformation depends on whether the schema is defined in a transformation ("transformation-owned") or inherited from an activity ("activity-owned"):

  • Transformation-owned Schemas: When a schema is defined directly in a transformation, it is owned by the transformation and referencing the transformation is straightforward, as described in Component Reuse.
  • Activity-owned Schemas: Unlike transformation-owned schemas, when you reference a transformation that inherits at least one schema from an activity, the order in which an operation's steps are configured controls how the schemas are propagated across transformation references and determines what messages are presented in the UI. Those messages help you choose which schemas to use and facilitate the creation of a new transformation when appropriate. After referencing a transformation, you may need to refresh schemas in the original or referenced transformation to resolve validation errors, as described on this page.

As this page is supplemental to Component Reuse, it does not repeat the information contained on that page. See Component Reuse for the definitions of component reuse terminology and how to copy, cut, and paste components (including transformations).

When you reference a transformation, it is helpful to keep in mind that both the original transformation and the newly referenced transformation refer to the same component. Any changes to either transformation affect all instances of the referenced transformation, including changes to how schemas are defined. If a transformation inherits a schema from an adjacent activity, and you then create a reference to that transformation in another operation, the newly referenced transformation will contain a reference to the original schema, despite the activity not being adjacent to the referenced transformation.

When that occurs, you must rectify the changes by refreshing schemas and resolving any validation errors.

Use Case

The intended use case for being able to reuse a transformation inheriting activity-owed schemas is when you have multiple sources where you want to use similar mappings to the same target.

For example, you may have source data provided through activity-owned schemas in both Endpoint A and Endpoint B. If the structure of the source data is similar, you may want to reuse transformation mappings prior to reaching the target of Endpoint C. In this case, you can first create the operation using Endpoint A as the source, and then copy its transformation to reuse in another operation using Endpoint B as the source.

While it is also possible to reuse a transformation inheriting an activity-owned target schema, any original transformation mappings will be removed from a new copy of the transformation that is created when the target schema is refreshed.

Best Practices

We recommend following these best practices when reusing transformations that inherit at least one schema from an activity.

Adding Activities Prior to Adding a Transformation

When you intend to reuse a transformation with activity-owned schemas in an operation, always add the activities to the operation first, prior to adding the transformation to the operation. The recommended order is demonstrated by Scenarios A and B later on this page.

When you do not follow this best practice and instead first add a referenced transformation to an operation, and then add an adjacent activity providing a schema, this causes all instances of the referenced transformation to automatically be updated with the new schema. This may cause previously valid referenced transformations to become invalid and mappings may no longer be displayed.

To recover from this situation, on each affected referenced transformation, open the transformation and use the link that appears in the transformation header to refresh the schema and create an independent copy of the transformation with its original schemas. The mappings will reappear when the original schemas have been replaced.

Resolving Mapping Errors

After you refresh a mismatched activity-owned schema in a referenced transformation, you may find that the independent transformation copy that is created has retained mappings that are now invalid due to non-existent fields from the prior referenced transformation. Existing mappings referencing a source or target node or field that no longer exists are no longer visible in the independent copy of the transformation, as they are no longer valid.

These errors are not displayed in an open transformation but can be identified from the project pane.

To resolve these mapping errors, we recommend removing all invalid mappings using the target root node's actions menu option Remove All Invalid Mappings.

Scenarios

These scenarios are used to describe and demonstrate the process for referencing a transformation that inherits a schema from an activity. Transformations can be referenced in an operation by dragging and dropping or by pasting an operation that has been copied or cut (see Creating a Component Reference in Component Reuse).

Each scenario refers to the placement of transformations and activities on the design canvas as steps of an operation.

All of these scenarios are based off of an original operation that uses a transformation where both the request and response schemas are inherited from initially adjacent activities.

NOTE: The behavior in these scenarios also applies for the relevant schema side if just one schema is being inherited by the transformation.

In the original operation (Original Operation), an Amazon Redshift Query activity (Query Accounts) provides a request schema and an Amazon Redshift Upsert activity (Upsert Accounts) provides a response schema for the transformation (Transformation):

The order in which an operation's steps are added controls how the schemas are propagated across transformations and determines what messages are presented in the UI. The table below summarizes the possible scenarios and the steps in which an operation with a transformation reference could be configured. Each scenario is described in a following section, as linked in the table.

ScenarioSummarySteps
Scenario A

A referenced transformation is added next to an activity with a defined schema

  1. Add an activity to an operation where the activity provides a schema (such as a configured server-based activity or a file-based activity configured with a schema).
  2. Add a referenced transformation to the same operation next to that activity.
Scenario B

A referenced transformation is added next to an activity without a defined schema

  1. Add an activity to an operation where the activity does not provide a schema (such as a file-based activity that has been configured to provide no schema). Server-based activities that have not yet been configured with a data schema are not considered here, as they are invalid.
  2. Add a referenced transformation to the same operation next to that activity.
Scenario C

An activity with a defined schema is added next to a referenced transformation

  1. Add a referenced transformation to an operation.
  2. Add an activity to the same operation where the activity provides a schema (such as a configured server-based activity or a file-based activity configured with a schema).
Scenario D

An activity without a defined schema is added next to a referenced transformation

  1. Add a referenced transformation to an operation.
  2. Add an activity to the same operation where the activity does not provide a schema (such as a file-based activity that has been configured to provide no schema). Server-based activities that have not yet been configured with a data schema are not considered here, as they are invalid.

Scenario A: Referenced Transformation Added Next to an Activity with a Defined Schema

In Scenario A, two new Amazon Redshift activities are first added to a new operation and fully configured (through completion of their data schema steps): Query Companies and Upsert Persons. Then the transformation from Original Operation is added as a reference between the two activities. 

The newly added transformation reference is valid, without validation errors. However, the operation itself is invalid:

The operation validation error indicates that the transformation's schemas do not match the schema structures provided by the transformation's adjacent activities:

Fixing Validation Errors

To fix the operation validation issue, open the transformation. A message indicating a mismatch is displayed in the transformation header. Click the link within the message to refresh the relevant schema:

NOTE: Only one message is displayed at a time. If both schemas have a mismatch, use the link in the message to refresh one of the schemas first and create a copy of the transformation (covered below). After creating a copy of the transformation, another message is then displayed, which you can use to refresh the other schema side.

On clicking the link to refresh the schema, a dialog is displayed that prompts you to create a copy of the transformation as a new, independent component:

Clicking Continue will separate this transformation from other locations where it is referenced and create and open a new transformation component that uses schemas defined according to the standard precedence. That is, the new transformation first uses a schema inherited from an activity adjacent to the new transformation. If there is no adjacent activity present or if a schema is not defined in the activity, the transformation uses a schema defined in the new transformation. The new transformation is no longer connected to the prior transformation by reference or to any schemas inherited by the prior transformation's adjacent activities.

If both schemas have a mismatch, or if the transformation is no longer referenced by any other operations, the newly created transformation will be created with another message indicating a schema mismatch on the other side of the transformation. Again click the link within the message to refresh the relevant schema:

A dialog indicates that the current schema will be removed from the transformation and the transformation will then inherit the schema from an activity adjacent to this transformation:

Clicking Continue refreshes the schema in the current transformation:

The transformation may then be invalid due to the schema changes on the activity side (for example, if fields are renamed or no longer present):

Map target fields as desired to configure the transformation.

NOTE: If transformation validation errors remain, there may be existing mappings from the prior referenced transformation that are retained by the transformation copy, causing the transformation to be invalid. To resolve, see the section Transformation Mappings later on this page.

Scenario B: Referenced Transformation Added Next to an Activity without a Defined Schema

In Scenario B, two new Variable activities are first added to a new operation: Read from Variable and Write to Variable. Each Variable activity is fully configured with the option not to provide a data schema. Then the transformation from Original Operation is added as a reference between the two activities.

All operation steps and the operation itself are valid:

In this scenario, the original transformation (Transformation) is able to be referenced without any issues. The transformation continues to use schemas that are defined in the activities of Original Operation.

When you open the transformation, you can see where the schemas are coming from, and can open each Amazon Redshift activity using the schema's actions menu to select Edit Activity. Using Refresh Schema refreshes the schema from the activities of Original Operation:

Scenario C: Activity with a Defined Schema Added Next to a Referenced Transformation

In Scenario C, the original transformation (Transformation) is first added to a new operation as a reference.

Two new Amazon Redshift activities are then added to the operation on either side of the transformation: Query Employees and Upsert User Data:

When you configure each activity's data schema, the adjacent transformation, which was formerly inheriting its schemas from the activities in the Original Operation, now inherits its schemas from the activities directly adjacent to the transformation. This affects all locations where the transformation is referenced, and may cause other operations to become invalid.

On opening each activity to configure it, a dialog lists all other operations that reference the transformation and will be affected:

This dialog is informational, as clicking Continue will not yet update the transformation. Instead, clicking Continue returns you to the activity configuration screen. Canceling out of this dialog using the Esc key has the same result as clicking Continue. If you close out of the activity configuration screen without configuring the data schema, the adjacent transformation will be unaffected, as the activity will remain unconfigured without a data schema.

Once you configure the activity with a data schema, the listed operations are affected.

WARNING: The action of adding the activities with defined schemas on either side of the referenced transformation affects all locations where the transformation is referenced, and may cause other operations to become invalid.

The newly added transformation reference is invalid:

The operation with the activities now providing schemas to the transformation is invalid because the transformation is invalid. However, in the transformation, there is no indication of transformation validation errors:

Fixing Validation Errors

In this scenario, the transformation is invalid due to schema changes on the activity side (for example, if fields are renamed or no longer present). To resolve, see the section Transformation Mappings later on this page.

In addition, other operations that use the transformation may have become invalid. For example, Original Operation automatically becomes invalid, as its transformation has been automatically altered to now reference the schemas of the activities in Scenario C:

To correct errors with other operations, open the transformation in each operation and click the link to refresh the schema that appears in the message in the transformation header (as described in Scenario A, above).

Scenario D: Activity without a Defined Schema Added Next to a Referenced Transformation

In Scenario D, the original transformation (Transformation) is first added to a new operation as a reference.

Two new Temporary Storage activities are then added to the operation on either side of the transformation: Read from Temp Storage and Write to Temp Storage:

On opening each activity to configure it, a dialog incorrectly implies that other operations will be affected by configuring the activity:

In fact, if you do not provide a schema in the activity configuration, the adjacent transformation will be unaffected. The adjacent transformation will continue to inherit its schemas from the activities in the Original Operation.

WARNING: If you later reconfigure the activity at another time to provide a schema in the activity configuration, the adjacent transformation and any other operations where it is referenced will be affected, though this dialog does not appear again.

After fully configuring each activity with the option not to provide a data schema, all operation steps and the operation itself are valid:

In this scenario, the original transformation (Transformation) is able to be referenced without any issues. The transformation continues to use schemas that are defined in the activities of Original Operation.

When you open the transformation, you can see where the schemas are coming from, and can open each Amazon Redshift activity using the schema's actions menu to select Edit Activity. Using Refresh Schema refreshes the schema from the activities of Original Operation:

Transformation Mappings

Regardless of where a transformation's schemas are defined, transformation mappings are owned by the transformation. If the target schema changes so that target fields with mappings are no longer present, and then those fields are later re-added, the transformation mappings will re-appear.

Mappings that remain in a transformation after its target schema has changed result in transformation validity errors if mapped fields are no longer present in a schema.

This can occur when a transformation with mappings that was once inheriting its schemas from adjacent activities is later referenced in another operation with different adjacent activities providing schemas, as described in the scenarios of Scenario B and Scenario D above.

In this case, a transformation validation error describes the missing fields, though no validation errors are displayed in the transformation itself.

To resolve, you can remove all invalid mappings using the target root node's actions menu option Remove All Invalid Mappings (see Target Nodes in Mapping Mode).

  • No labels