Detect Primary Keys In a Dataset

This example shows how to detect primary keys within datasets. It simplifies tasks such as database schema design, data integration, quality assurance, and migration by accurately determining primary keys. 

 

Java Code Listing

package com.northconcepts.datapipeline.foundations.examples.schema;

import com.northconcepts.datapipeline.core.*;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.memory.MemoryReader;

import java.util.List;

import static com.northconcepts.datapipeline.foundations.tools.DetectPrimaryKeysInDataset.detectPrimaryKeys;

public class DetectPrimaryKeysInADataset {
    public static void main(String[] args) {
        String[][] data = new String[][]{
                {"1", "1", "1", "1", "2", "1"},
                {"1", "2", "1", "2", "2", "1"},
                {"1", "3", "1", "3", "2", "2"},
                {"2", "1", "1", "4", "2", "2"},
                {"2", "2", "1", "5", "2", "3"},
                {"2", "3", "1", "6", "2", "4"},
        };

        RecordList records = new RecordList();
        for (int i = 0; i < data.length; i++) {
            String[] row = data[i];
            Record record = new Record();
            for (int j = 0; j < row.length; j++) {
                record.addField().setValue(row[j]);
            }
            records.add(record);
        }

        DataReader reader = new MemoryReader(records);
        DataWriter writer = new StreamWriter(System.out);

        Job.run(reader, writer);

        List primaryKeys = detectPrimaryKeys(records);
        if (primaryKeys.isEmpty()) {
            System.out.println("No primary keys found");
        } else {
            System.out.println("=================== Primary key column(s) ========================");
            primaryKeys.forEach(System.out::println);
        }
    }
}

 

Code Walkthrough

  1. data variable is introduced as a multi-dimensional array of strings.
  2. RecordList instance is created and filled with data inside the for loop. Each row in the input array is created as a separate Record.
  3. MemoryReader is used to obtain records from an in-memory RecordList.
  4. Job.run() is used to transfer the data from reader to StreamWriter(System.out). See how to compile and run data pipeline jobs. 
  5. Primary keys are detected using detectPrimaryKeys() method. In our example, the D column contains unique values. Hence, it is returned as the primary key column. If one column does not meet the primary key constraint, the combination of two or more columns can be returned.

 

Console Output

-----------------------------------------------
0 - Record (MODIFIED) {
    0:[A]:STRING=[1]:String
    1:[B]:STRING=[1]:String
    2:[C]:STRING=[1]:String
    3:[D]:STRING=[1]:String
    4:[E]:STRING=[2]:String
    5:[F]:STRING=[1]:String
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[A]:STRING=[1]:String
    1:[B]:STRING=[2]:String
    2:[C]:STRING=[1]:String
    3:[D]:STRING=[2]:String
    4:[E]:STRING=[2]:String
    5:[F]:STRING=[1]:String
}

-----------------------------------------------
2 - Record (MODIFIED) {
    0:[A]:STRING=[1]:String
    1:[B]:STRING=[3]:String
    2:[C]:STRING=[1]:String
    3:[D]:STRING=[3]:String
    4:[E]:STRING=[2]:String
    5:[F]:STRING=[2]:String
}

-----------------------------------------------
3 - Record (MODIFIED) {
    0:[A]:STRING=[2]:String
    1:[B]:STRING=[1]:String
    2:[C]:STRING=[1]:String
    3:[D]:STRING=[4]:String
    4:[E]:STRING=[2]:String
    5:[F]:STRING=[2]:String
}

-----------------------------------------------
4 - Record (MODIFIED) {
    0:[A]:STRING=[2]:String
    1:[B]:STRING=[2]:String
    2:[C]:STRING=[1]:String
    3:[D]:STRING=[5]:String
    4:[E]:STRING=[2]:String
    5:[F]:STRING=[3]:String
}

-----------------------------------------------
5 - Record (MODIFIED) {
    0:[A]:STRING=[2]:String
    1:[B]:STRING=[3]:String
    2:[C]:STRING=[1]:String
    3:[D]:STRING=[6]:String
    4:[E]:STRING=[2]:String
    5:[F]:STRING=[4]:String
}

-----------------------------------------------
6 records
=================== Primary key column(s) ========================
D
Mobile Analytics