Data migration is the process of transferring data between storage types, formats, or computer systems.
When is data migration performed
Challenges in migrating historical data into DataSight
To get the maximum benefit from DataSight, you need to be certain that the information in your database is reliable, complete and has a consistent set of definitions. This may not be easily achieved for historical data sets. As you need to standardise your data definitions to give each piece of information the same meaning, combining multiple data sources and migrating data across systems (data migration and database migration) requires sufficient time and effort.
The following may pose challenges during the migration of your data into DataSight:
You need to be aware of the limitations of your historical data sets and be realistic about the time that may be required to move this set into DataSight. Please refer to the Seveno website and the Knowledge Base for help with known data migration issues.
Migrate your historical data into DataSight
To successfully migrate data into DataSight, you need to design a process for data extraction and data loading, which relates your old data formats to DataSight's formats and requirements. The process of data migration will consist of firstly auditing your files, then developing the rules to standardise data definitions.
You may wish to ensure your data fields line up with your DataSight levels and variables, particularly if data is taken from spreadsheets, and that the variables are uniform, with any value ranges standardised across all records. This prevents data misinterpretation, enables more accurate selection from lists and helps identify gaps in your data. You may need to remove duplication within the database. Data migration phases (design, extraction, cleansing, load, verification) are commonly repeated several times before you can be confident in the integrity of your historical data set.
Certain functions in DataSight can and should be used to help streamline data migration. The importation process allows for a pre-load 'data validation' step, where you interrogate the data to be transferred to ensure that it fully complies with your database structure. Any issues with the data importation occurring at the point of loading are automatically reported in the import log.
After loading into DataSight, your results can be subjected to data verification procedures such as flagging to determine whether data was accurately translated and is complete.
For time series data, DataSight offers resolution as low as one second. Learn more about Time Limitation, Raw Data File Formats, Depth in Water Bodies and Duplicates.
See also:
In DataSight Version 3, the minimum time increment has been set to 1 second.
While the Microsoft SQL can currently store data captured to less than 3 milliseconds apart, DataSight is not currently designed to take this data. If you require sub second data to be stored in DataSight, please contact us.
At present DataSight cannot import multiple data records which are stored as one continuous row of data. You may need to preprocess the data using Excel to break the data into rows.
Certain types of environmental data may have records for seemingly the same date/time at one specific location. For example, if you are monitoring changes in water temperature at one site, but at various depths, the location and date will remain constant. This is fine, but each record must have a unique timestamp. In this instance it is imperative that the time changes for each of the depth entries. This can be as simple as varying each entry (depth) by minute, such as 19:50:00, 19:51:00, 19:52:00, and so on...
There are two options when you are faced with data such as this:
Option 1. Ensure time values are assigned correctly before importing. This may involve adjusting equipment settings to record separate timestamps for each sample, or editing the data manually before importing in DataSight.
Option 2. DataSight can assign time values during import (see Map Levels).
Note |
DataSight is designed for each measurement to have a unique timestamp, as we believe that even when a replicate measurements for a sample, that measurement have been CONDUCTED at a different time and can be differentiated by this. Think about how you are recording your measurements with respect to time, to help resolve issues as described above. |
In DataSight, data for the same variable at a given site are stored with unique date and time stamps on DataSight. But when capturing environmental data, you may take duplicate measurements from a given locality or analyse a sample multiple times to obtain a statistically representative value for your measurement. At present, to save these records on DataSight for scientific interrogation, you may enter such duplicate data with differing or unique time stamps. We recommend using a small time increment between each entry (e.g. one second). This can be done during Import in Step 4, Map Fields. It is also recommended that you enter the sample name or code against the timestamps in a sample number variable to be able to filter and identify data for a given sample number or event.