Brian Vander Plaats

Developer

Blog

12 Oct 2015

Creating the Bike Store Database – Part I – Product Schema

Over the next few months I plan on doing a bunch of research on Entity Framework, WebAPI, MVC, and AngularJS.   My focus will be on how these technologies can be used to build enterprise level applications.   To accomplish this I need a sample database that properly models typical enterprise entities.   To this end, I’ve created a database for a fictional “Bike Store”.   At the bike store, we will need to track a product catalog, inventory, customers, sales, employees, and so on.  Below is a  high level overview of the data needs:

BikesStoreDatabaseArchitecture

At this time I’m only implementing the Product area of the database schema, but you can see how it fits into the larger theoretical schema.  This represents a fairly standard view of any enterprise. Notably we are not modeling any type of manufacturing or research entities as this is just a retail store.   The detailed product schema is as follows:

 

BikeStoreProductSchema

Model

A model represents a product, and may have one or more part numbers.  Coke and Coke Zero are two different models. Coke – 20 oz Bottle and Coke 12 oz can are two different part numbers of the Model – Coke.  Some systems do not contain an entity for model, but represent everything at the part number level. This of course, breaks First Normal Form.   In the bike store there are products that have variations on size / color, tied to detailed product information.  Using the Model entity allows a common profile of product information to be shared by one or more part numbers.

A key design need is to provide a structure for storing hundreds of disparate attributes for different types of products.  Simplistic systems create dozens of columns for an entity, which leads to highly sparse records, which are not efficient for storage, and requires schema changes when new attributes are needed.   The next best thing is to create a key/value table relationship in the form of Model->ModelDataFieldValue->ModelDataFieldtype.   This works well and I’ve done this in a number of scenarios.  However for this project I’m exploring the use of JSON for this information.  For each Model entity, I simply need to have a field to store the JSON data.  Here I’m actually using two.  One for category-specific fields, and one for model-specific fields.  For example, for bicycles, I want to record the groupset, frame type, frame fit, wheelset, tires, crank dimensions, etc.   This would apply to all bicycle categories.   However for Trek I might want to also record if the bike is eligible for ProjectOne, or a special rebate, etc.  

Ultimately the difference looks like this:

TraditionalTableRelationships

vs.

[
    {"Crucial BX100 250GB SSD":{
        "Capacity":  "250 GB",
        "Sequential Read":  "535 MBps",
        "Sequential Write":  "370 MBps"
		}
    },
    {"Samsung 850 EVO 250GB SSD":{
        "Capacity":  "250 GB",
        "Sequential Read":  "540 MBps",
        "Sequential Write":  "520 MBps"
		}
    }
]

(thanks to @kitroed for sending me SSD deal links while writing this post…)

Advantages to storing in JSON

  • Faster querying, removing additional joins
  • Many front-end UI components work directly with JSON, don’t need to write code to process table results
  • Disadvantages to storing in JSON

  • Searching for keys matching a specific value cannot be done in a simple where clause
  • Higher storage requirements (store JSON string name along with value)
  • Part Number

    Part Number or SKU represents the item that goes in inventory and sold to the customer.  A part number is identified by the manufacturer part number or the bike store inventory part number.  Most inventory systems will want to use a common / global part number scheme, since we have no control over what that manufacturer assigns in their inventory system. Note that a serial number is an instance of PartNumber at the inventory level for large items like bicycles.  

    ex.

    Model:  Gel Cork Tape

    Invoice Description:  Gel Cork Tape – Red

    Inventory Part Number: 10500

    Manufacturer Part Number:  402500

    Manufacturer, Category, Status

    Manufacturer (or “make”) is essentially the label on the product e.g. Trek.  We don’t care if Trek actually manufactures their frame, or that Bontrager is simply a label that Trek owns – at the product level these are both manufacturer’s.  At the purchasing level, we may end up buying both bontrager and trek products from the same vendor/distributor.  

    Category includes Road Bikes, Saddles, Grip Tape, etc.   Both Manufacturer and Category have JSON fields.  These fields store blank sets of string/value pairs.  Will be used by product entry screens to populate the related category / manufacturer specific fields that need to be filled out.

    Status is also standard, but note it doesn’t contain “out-of-stock”.  This is a dynamic status that should be determined by the inventory module, and could fluctuate on a daily basis, while the product is “active” the entire time.

    Github, Visual Studio Database Projects

    I’ve implemented the database using Visual Studio 2015 database projects.  It includes initial sample data, and can be deployed repeatedly to the same database endpoint.  For the sample data, I’m using a merge approach, loading data from tab-delimited text files. This is the first time I’ve worked with database projects and overall I’m pretty happy with the tool.

    I’ve deployed the current build to GitHub as brianvp/bikestore-database.   As I work through my research areas I will update as needed.

    08 Oct 2015

    Generating JSON from CSV Using Powershell

    Here is a technique I picked up while researching ways to generate JSON output from a CSV file.  My main use case is generating sample data quickly.  While JSON is easy to read and parse, I don’t feel like hand typing dozens of string/value pairs, I want to use excel like a sane person.  I also do not want to upload my JSON to some random internet site.  Thankfully Powershell (v3) has a built-in cmdlet that allows us to convert to JSON.  

    Step 1 – Generate Test Data using preferred spreadsheet

    JsonCSVExcelInput

    Step 2 – Save File as either CSV or Tab-Delimited text file

    JsonCSVSaveDialog

    JsonCSVInputFile

    Step 3 – Parse the ouput file using the ConvertTo-Json powershell cmdlet, and output to a json file

    import-csv "SampleInput.csv" | ConvertTo-Json | Add-Content -Path "output.json"
    

    JsonCSVPowershellStep1

    which produces the following output:

    JsonCSVJsonOutput

    variation #1 – Convert from tab-delimited file

    import-csv -Delimiter "`t" "SampleInput.txt"  | ConvertTo-Json | Add-Content -Path "output.json"
    

    variation #2 – Remove whitespace / carriage returns from output file

    import-csv "SampleInput.csv" | ConvertTo-Json -Compress | Add-Content -Path "output.json"
    

    which produces the following output:

    JsonCSVJsonOutputCompressed

    06 Oct 2015

    SQL Server Tip #1: Basic Table Template

    A common SQL task is to set up new tables for an application, and while you can use the GUI for this, I prefer using scripts. It’s faster, and leads to better deployment practices. (The one time I do find the designer useful is when making a column size change to a table.  In this case, I will make the change in the designer, and generate a script.  It looks ugly, but saves a lot of manual coding)

    SqlTip1_TableDesignerSqlTip1_TableDesignerChangeScript

    Since I don’t typically create tables every day, even as a dba, I like using a basic template. You can script out a template using an existing SQL Server table, which looks like this:

    /****** Object: Table [product].[Category] Script Date: 10/6/2015 8:38:18 PM ******/
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE TABLE [product].[Category](
    [CategoryId] [int] IDENTITY(100,1) NOT NULL,
    [Name] [nvarchar](50) NULL,
    [CustomDataFields] [nvarchar](max) NULL,
    [DateModified] [datetime2](7) NULL,
    [DateCreated] [datetime2](7) NULL DEFAULT (getdate()),
    PRIMARY KEY CLUSTERED
    (
    [CategoryId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    
    GO
    

    Or if you are using SQL 2012 or later you can use code snippets:

    CREATE TABLE dbo.Sample_Table
    (
    column_1 int NOT NULL,
    column_2 int NULL
    );
    

    I don’t care for these much (too verbose or too simplistic), so I created my own:

    CREATE TABLE tempdb.dbo.temp1
    (
     Temp1Id INT IDENTITY(1,1) NOT NULL
     CONSTRAINT temp1PrimaryKey PRIMARY KEY CLUSTERED,
     VALUE VARCHAR(25),
     DateModified datetime2,
     DateCreated datetime2 default getdate()
    )
    GO
    
    CREATE TABLE tempdb.dbo.temp2
    (
     Temp2Id INT IDENTITY(1,1) NOT NULL
     CONSTRAINT temp2PrimaryKey PRIMARY KEY CLUSTERED,
     Temp1Id INT NOT NULL
     CONSTRAINT Temp2Temp1ForeignKey FOREIGN KEY (Temp1Id)
     REFERENCES dbo.temp1(Temp1Id),
     VALUE VARCHAR(25) NULL,
     DateModified datetime2,
     DateCreated datetime2 default getdate()
    )
    GO
    

    Gist

    At my current job, I allow developers to create their own tables (following standards of course!), and a very common code review item is that they forget to make the foreign key constraint! It’s simpler to add this at the time of creation. Sometimes this makes data insertion a problem when people are “testing” – but it’s far worse to have this go uncaught when deploying to production.

    Additionally, unless you are using database projects, it’s a good idea to keep all your table creation logic in a single script. This will save a lot of time when deploying to staging or production.