These pages show you how to define the metadata schemas used for importing custom data into Fuse. The metadata is used by the system to understand how to represent your data and store it. There are several different pieces of metadata used by the system depending on what you are trying to achieve. But, the central piece that is required for any metadata to be usable is the data definition metadata.
All metadata files are represented in JSON format. The actual data, that this metadata describes is, normally CSV.
Data Schema
This page discusses how to create a data definition metadata file. This JSON schema defines the columns, data types, display names, and other data-centric properties within your dataset.
Data Definition Metadata Syntax
To get started, here is a quick example we can use to discuss the various fields and what they mean.
{ "name": "$Human Readable Name", "collection": "$unique_collection_name", "cacheHashes": true, "separator": ",", "columns": { "$logical_column_name1": { "fieldName": "$solrFieldName", "dataColumn": "$Display For Logical Column", "display": "Display Name", "dataType": "STRING" }, "$logical_column_name2": { "display": "$Display For Logical Column", "dataType": "STRING" } }, "transform": [ { "command": "trimAll" }, { "command": "emptyToNull" }, { "command": "reject", "columns": { "$logical_column_name1": "00/00/0000", "$logical_column_name2": [ "0", "99999" ] } }, { "command": "replace", "value": "00/00/0000", "with": "", "columns": [ "$logical_column_name1", "$logical_column_name2" ] } ] }
Field Definition
- name – A human-readable name just for recognition
- collection – This is the unique ID used to identify the metadata and data instances. These must be alphanumeric strings with underscores (‘_’) for spaces.
- separator – This is the separator that divides the data in CSV format files
- cacheHashes – Optimization to cache data in memory for quicker duplicate checks. Defaults to true.
- columns – An object where the key is the logical column name and the value is the column’s definitions within the data file that we will import.
- transform – An array of modifications (i.e. Transform definitions) performed on the data during import.
Column Definition
- fieldName – The field name to use within SOLR. If not specified, a dynamic field name will be used based on the logical name of the column and data type to produce a SOLR-friendly dynamic field. For example, if a logical column is called age which is an INTEGER, the fieldName would be age_i in SOLR.
- dataColumn – The column name within the data file being imported. This allows you to rename columns from the data file to the logical column name. If not specified, it uses the logical column name as the column in the data file.
- dataType – The data type of the field to use when importing data. The following data types are supported:
- STRING
- TEXT
- INTEGER
- LONG
- FLOAT
- DOUBLE
- BOOLEAN
- COORDINATE
- DATE
- TIMESTAMP
- CURRENCY
- ENUM
- values – Used to specify the acceptable values for an ENUM data type. These data types will be checked by the system during import. Any unexpected value will be rejected.
- currencyColumn – Used to specify the logical column name that holds the currency code for a CURRENCY data type field (e.g., currency code would be something like USD, EUR, JPN, etc.).
- referenceDateColumn – Used to specify the logical column that holds a date for when this currency amount was in time. This is used to calculate the equivalent amount in another currency using a conversion from the provided date in history.
Transform Commands
Transform definitions support different properties based on which command you are using. Here is the list of possible commands and their properties. Properties are specified as siblings of the command property.
- replace – Replaces a value with another value in a series of columns.
- value – A fixed string for matching which values should be replaced.
- values – An array of fixed values to use for matching. If any of these values match the value within the given column, then it is replaced with the given with property.
- regEx – A regular expression used for matching which values should be replaced.
- with – A string value that should be used to replace the matched portion of the string.
- columns – This is either a single or an array of logical columns that are to be searched for replace values. Any matches will be written back to their respective columns.
- trimAll – This trims away any whitespace across all columns.
- emptyToNull – Any column with an empty value will be replaced with a null value.
- parseDataTypes – This will parse all logical columns by their specified data type turning normal string values into their respective values. This should be added at the end after all the commands as once this command is executed, all the columns change into their respective data types from String.
- dateFormat – Date format to use when parsing DATE data types.
- timestampFormat – Date and time format used when parsing TIMESTAMP data types.
- reject – Rejects a row if that row’s data matches any of the given column expressions.
- columns – An object that contains the logical column name as a key and the value or regular expression to use for matching against. If any of the columns are matched, the row is rejected.
- filter – A series of columns and regular expressions used to match the data that should be allowed through. Any row that fails to match all columns’ expressions is rejected.
- columns – An object where the key is the logical column name, and the value is a regular expression used to match which rows are accepted. Anything that doesn’t match all expressions for all columns specified is rejected.
- default – Sets a value(s) in a column(s) to a predefined value. If the value starts with a ‘$’ then it refers to another logical column and uses the value within that logical column.
- columns – An object where the key is the logical column name, and the value is the value to set into that column when an empty value is provided. If the value starts with a ‘$’ then that refers to another logical column’s values as the default.
- format – Formats a new field using format templates. This can be used to create new fields from existing fields by referencing the logical columns and formatting those using format specifiers.
- fields – An object keyed by field names to format template strings. Format templates are strings where values within {} are formatted from logical column names. The output of these format templates is stored in the field name (i.e., the key). For example, this format template:
- fields: {
- “salutation”: “Hello {first_name}”,
- “display_check_in”: “{check_in,date,MM/dd/yyyy} at {check_in,time,short}”,
- “mfa_report”: “MFA < require_mfa ? enabled : disabled >”
Dynamic Field Linking
Within a data schema, you can atomically populate in certain attributes about an employee into your data without resolving it manually in an ETL. This makes it easy to link custom data with data already in the system.
It requires you have an employee_source_id logical column, and that column must define a referenceDateColumn in order for it to work.
When you are creating your schema, if you include any of the logical column names below when loading your data, it will be joined and filled in with values from the employee record that it links to. The list of columns are:
[ "employee_type", "employee_subtype", "employee_status", "contract_type", "employee_pay_type", "employee_pay_frequency", "person_source_id", "person_fuse_id", "person_first_name", "person_last_name", "job_description", "company_name", "work_address_id", "work_address", "work_address_street1", "work_address_city", "work_address_state", "work_address_postal_code", "organization_id", "org_level_1", "org_level_2", "org_level_3", "org_level_4", "residence_street1", "residence_street2", "residence_street3", "residence_city", "residence_state", "residence_postal_code", "residence_country" ]
Enforcing Security
You must include organization_id and work_address_id as logical columns in your dataset schema for security to be enforceable.
In order for your data to enforce and conform with security policies defined on the user’s roles, you must include the organization_id and work_address_id logical columns in your dataset schema. These columns can be populated from the process described above so your underlying dataset doesn’t need to include columns. But, without these columns defined, the data can’t be separated properly to enforce security configured on the Role. For example, if a user has a role that limits its data to employees who belong to USA > sales, then that dataset needs to be linked to an organization_id so that the role definition can be enforced. The same goes for work_address_id.
Related articles
How to autogenerate a Metadata data schema from data
When importing custom data, you have to create a metadata data schema file which can be tedious for “fat” data files with lots of columns. Fortunately, we’ve created a tool that you can run from your dev environment that will analyze the file and write out a JSON metadata schema file for you.
Instructions
This only is to get started on generating a metadata schema. It only recognizes a limited number of types from the input: DOUBLE, LONG, DATE, CURRENCY, BOOLEAN, and STRING. It also defaults the collection to the name of the file. It will require further edits to make it usable.
There is a top-level Tool class that acts as a central entry point to all of the tools available. You’ll need to replace the ${version} with the version of the code you are using (see top-level build.gradle file for that). To run it, you can do the following:
> java -jar fusearchiver-tools-${version}.jar analyze -h
This will make sure you can run the tool and print out the command line help. The following command line options are supported:
- -s, –separator – This defines the separate used by the data file used while parsing. Defaults to a comma.
- -v, –verbose – This turns on verbose output for debugging purposes.
- <input file> – This is the filename of the CSV file you’d like to analyze. This is required.
Here is an example of running it:
> java -jar fusearchive-tools-${version}.jar analyze -s , adp-payroll.csv
This will analyze the file and write out an adp-payroll-data.json file that is the metadata schema file for it. You don’t have to run it from the command line. You can run it from your IDE by navigating to the Tools class and setting up a run configuration for it.
The analyze tool reads the entire file, so for large files, it could take a while. You may want to create a smaller version in order to analyze it quickly. This is a future enhancement opportunity adding a command line option to parse only the first X lines.
Layout Schema
Layout Definition Metadata Syntax
Layout definition files define the UI elements used to display data. They refer to collection data, and then define what fields are displayed and how. Here is an example:
{ "name": "$LayoutName", "identifier": "$uniqueIdentifer", "placement": "employee_personal" | "employee_job" | "employee_payroll" | "employee_time" | "employee_misc" | "employee_benefits" | "report", "sortOrder": 1, "sections": { "sectionName": { "type": "table" | "popup" | "dialog" | "grid" | "reportButton" | "fieldList" | "tab" | "grid" | "search", "identifier": "$uniqueSectionId", "classes": ["$cssClass1", "$cssClass2", "$cssClassN"], "styles": { "$cssProperty1": "$cssValue1", "$cssProperty2": "$cssValue2", "$cssPropertyN": "$cssValueN" }, "additionalData": { "$dataSourceName": { "dataCollection": "$destCollection", "keys": { "$sourceCollectionKey1": "$destCollectionKey1", "$sourceCollectionKey2": "$destCollectionKey2", "$sourceCollectionKeyN": "$destCollectionKeyN" } } }, "pageSize": 0-99+, "hidden": true | false, "when": { "$eventName": { "dataCollection": "$srcDataCollection", "filter": { "dataCollection": "$destDataCollection", "staticFilter": { "$srcColumn1": "$destColumn1", "$srcColumnN": "$destColumnN" } } } } } }, "template": "<htmlTag>....</htmlTag>", "templateUrl": "filename.html" }
Field Definitions
- name – This is the name of the layout. This should be a displayable name to the user. (required)
- identifier – This is a unique identifier used to identify this layout. This must be unique across all other layouts. (required)
- placement – This is an enumeration of strings that tell where this layout will belong. The possible values are:
- employee_personal
- employee_job
- employee_payroll
- employee_time
- employee_compensation
- employee_performance
- employee_benefits
- employee_misc
- report
- sortOrder – Establishes precedence for ordering multiple layout files within the same placement. The lower the number, the closer to the left. The higher the number, the further to the right. When multiple layouts are placed on a placement, a tabbed UI is rendered for each layout.
- sections – An object where each property is the name of a component, and the definition of those components is the value. Each component has some common properties that are discussed in the sublist below.
- identifier – Unique identifier to assign to this component
- classes – CSS classes to assign to the root element of this component
- styles – CSS styles to apply to the root element of this component
- type – An enumerated value that identifies the type of the component
- table – This is a table component for displaying multiple objects at once across a series of columns. The table is filterable, sortable, supports pagination, and allows for displayed column customization. It can also link out to other components providing navigation to other components defined within the layout as well as to other links within the UI.
- popup – A modal popup that displays a set of fields in a single column: one field per row. It does support the template / templateUrl pattern, but requires html tags to define the layout of the popup. This is DEPRECATED because of the complexity and security issues with using template / templateUrl. Use dialog instead.
- dialog – A modal dialog that displays any set of components. This component replaces popup in favor of a recursive component that can display more complex UIs without the need for external HTML, so the security issues are much more limited.
- tab – Displays a set of tabs that display a single component at once. Each tab can nest another component inside it.
- fieldList – This is a single column display of a set of fields. This is DEPRECATED in favor of grid since it provides much more flexible layout.
- grid – This displays a set of fields across multiple columns. This supports much more complex layouts without the need for HTML. It’s simpler to build UIs than using HTML and doesn’t suffer from the security issues of using template / templateUrl.
- search – This displays a search-related screen to perform searches across objects. It supports faceted search as well as text-based search (e.g., Google searches).
- reportButton – This provides a call-to-action button to create a report based on data selected. How the data is selected is defined outside of this component. Typically, you can place this on a dialog’s button tray to allow someone to export or print a report configured on the button. This requires creating a custom report and defining a report metadata schema.
- hidden – Hide the component from being displayed (default false).
- pageSize – The size of how many items will be requested and displayed when paginating a list of results. (Default 25).
- additionalData – Data structure describing additional data sources to load, and how to link each record from the defined data collection to the additional data collection using one or more keys.
- when – Specifies an action to take when a certain event (specified by $eventName) is performed. In the above example, when a load happens on the source data collection, the UI should load the second data collection by filtering it by setting the destColumns1 – destColumnN to the srcColumns1 – srcColumnsN. The source columns come from the source data collection specified above.
- template – An HTML fragment used to display the sections. This string references AngularJS components to display the configuration defined in sections. DEPRECATED – THIS SHOULD NOT BE USED IN THE FUTURE.
- templateUrl – A filename of an external HTML file that displays the sections configured above. This references AngularJS components for the sections. DEPRECATED – THIS SHOULD NOT BE USED IN THE FUTURE.
Layout Components
These are the components you can use to display data within your custom UIs.
- Table Component
- Grid Component
- Dialog Component
- Tab Component
- Search Component
- ReportButton Component
- PaycheckTable Component
Table Component
Table component is a complex component that offers user-controlled filtering, sorting, configurable columns, data exporting, pagination, and customizable navigation. Here is a quick example of a table:
{ "dataTable": { "type": "table", "name": "Dependents", "identifier": "dataTable", "pageSize": 35, "main": true, "dataCollection": "dependents", "displayedColumns": [ "dependent_first_name", "dependent_last_name", "relationship", "type", "status", "child_classification", "eligibility_start_date", "eligibility_end_date", "date_medical_coverage_began" ], "settings": { "start_time": { "format": "HH:mm" } }, "sortOrder": [ { "column": "eligibility_start_date", "order": "descending" | "ascending" } ], "links": { "_row_": { "from": "dataTable", "to": "dependentsDetailsDialog", "keyFields": ["id"] } }, "filters": [ { "column": "work_date", "ui": "DATE_RANGE" | "FIXED_DATE" | "SINGLE_DATE" } ] } }