[Pre-W3] Relational Database Management System

YC Tech Academy Backend Career Project

Overview

In today's blog, we will be going over the topic of relational database management system, or RDBMS. It is crucial to understand this system, as this is the core of application database. We will be going over what relational database is, how this is implemented in real databases, and how they can be connected with spring boot apps via JPA and Hibernate.

What is RDB

Relational Database, as the name suggests, is a database that is connected by relationship. This means the database structured so that objects are connected horizontally, depending on each other. It organizes data into structured tables, ensuring that data remains accessible, consistent, and accurate. The "relational" part of its name derives from the fact that data within these tables can be related or linked based on specific criteria, allowing for meaningful connections between data points.

Key Characteristics:

  1. Tables (or Relations): Data in a relational database is stored in tables (sometimes referred to as relations). Each table contains rows and columns, where rows represent individual records and columns represent the attributes or fields of the records.

  2. Primary Key: Each table typically has a column or combination of columns known as the primary key. This key uniquely identifies each record in the table, ensuring that there are no duplicates.

  3. Foreign Key: To establish relationships between tables, a table may contain a column called a foreign key. This column links records in one table to records in another, based on common values.

  4. Schema: The design or blueprint of a relational database, known as its schema, defines the tables, fields, relationships, indexes, and other elements. The schema is crucial for understanding the organization of data within the database.

  5. Data Integrity and Normalization: Relational databases often employ normalization techniques. This is a systematic approach to decompose tables to eliminate data redundancy and improve data integrity. In simple terms, normalization ensures that each piece of data is stored in its most logical place.

  6. SQL (Structured Query Language): This is the standard language used to query and manipulate relational databases. With SQL, you can perform a variety of tasks, such as inserting, updating, deleting, and retrieving data from a database.

Advantages:

  • Data Accuracy and Integrity: Due to the strict relationships and normalization techniques, relational databases minimize data redundancy and maintain data accuracy.

  • Flexibility in Queries: The relational model allows users to generate a wide variety of queries to extract the needed data in numerous ways.

  • Data Security: Relational databases typically offer strong data security features. Access to data can be controlled at both the table and field levels.

  • Scalability: They can efficiently manage large amounts of data and can be scaled up or out as required.

Relationship Management

The heart of any relational database is the connection between its tables. In an RDBMS, tables are intertwined in a sophisticated manner, allowing for efficient data retrieval and storage. This connection isn't just arbitrary; it's defined by specific keys that ensure data integrity and avoid redundancy.

Unique Key:

At its core, a unique key serves as a rule to ensure that all data in a column, or a combination of columns, is distinct. The beauty of the unique key lies in its ability to prevent duplicate entries, ensuring that each row has a unique identity. Practically, this means that once you set a column as a unique key, trying to insert or update data with an already existing value becomes a forbidden action.

CREATE TABLE Users (
    UserID INT,
    Username VARCHAR(255) NOT NULL,
    Email VARCHAR(255) NOT NULL UNIQUE,  -- Unique key constraint on Email column
    Password VARCHAR(255) NOT NULL
);

Primary Key:

One step further is the primary key—a specialized unique key. It's more than just ensuring uniqueness; it's about being the primary method for identifying records within a table. A primary key is a beacon of reliability as it can't contain NULL values, ensuring that every record can be unequivocally identified.

CREATE TABLE Students (
    StudentID INT NOT NULL PRIMARY KEY,  -- Primary key constraint on StudentID column
    FirstName VARCHAR(255) NOT NULL,
    LastName VARCHAR(255) NOT NULL,
    Age INT
);

Foreign Key:

If you've ever wondered how tables talk to each other, the foreign key is your answer. Acting like a bridge, the foreign key establishes a link between data in two tables, ensuring that records in one table correspond to records in another. It’s this key that upholds the ‘relational’ aspect of RDBMS, ensuring data consistency across tables.

CREATE TABLE Orders (
    OrderID INT NOT NULL PRIMARY KEY,
    OrderDate DATE,
    CustomerID INT,
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)  -- Foreign key constraint
);

Indexing:

Beyond keys, databases employ a tactic to boost query performance: indexing. Imagine a library without a catalog, where you have to scan every book to find the one you need. Indexing provides the catalog for databases. With indexing, specific rows in a table can be found without scanning the entire table.

The process involves creating, viewing, and even deleting indexes, each serving a purpose to improve the database's efficiency. A notable variant, the unique index, ensures that data stored is distinct, just like a unique key.

1. Creating Index:

CREATE INDEX idx_studentname
ON Students (FirstName, LastName);

This command establishes an index named idx_studentname on the FirstName and LastName columns of the Students table. What this means is that the database creates a separate data structure (typically a B-tree) that's ordered by these two columns. When you query data based on these columns, the database can quickly consult this index to locate the relevant rows, instead of scanning the entire table.

The selection of columns for indexing usually depends on which columns are frequently used in the WHERE clause of the SQL queries.

2. Dropping Index:

DROP INDEX idx_studentname ON Students;

Over time, you might find that some indexes are not useful or that they are causing more overhead than they're worth (because every time you insert or update a row, the associated indexes also need to be updated). In such cases, you'd drop or remove these indexes to improve insert/update performance. This command gets rid of the idx_studentname index from the Students table, freeing up space and potentially improving write speeds to the table.

3. Showing Indexes:

SHOW INDEXES FROM Students;

Understanding what indexes are in place can help in performance tuning and debugging. This command lets you view all the indexes on the Students table, showing details such as the indexed columns, the type of index (e.g., unique, full-text), and other pertinent information. This information is critical, especially when considering adding new indexes or diagnosing performance issues.

4. Using Unique Index:

CREATE UNIQUE INDEX idx_unique_email
ON Users (Email);

A unique index not only improves retrieval speed but also enforces data integrity by ensuring all values in the indexed column(s) are unique. The above command creates a unique index on the Email column of the Users table. Now, if someone tries to insert or update a record that would result in duplicate email values, the database will throw an error. This is an example of how indexing can serve both performance and data integrity purposes.

Transactions in RDBMS

A transaction is a sequence of one or more SQL operations treated as a logical unit. This means that either all of the operations are executed, or none of them are. Transactions provide a mechanism to manage changes made by multiple operations as a single entity, ensuring that the database remains consistent even in the face of errors or system crashes.

For example, consider a bank system where you're transferring money from one account to another. This operation involves two steps: subtracting an amount from one account and adding that amount to another. Both steps need to complete successfully, or else you'd end up with an inconsistency in the data. Transactions ensure that such operations are treated as a single unit.

ACID Properties

ACID stands for Atomicity, Consistency, Isolation, and Durability. These are the set of properties that ensure reliable processing of transactions.

  1. Atomicity: This ensures that all operations within a transaction are completed successfully; if not, the transaction is aborted, and all changes are rolled back.

  2. Consistency: This ensures that a transaction brings the database from one valid state to another. The underlying data always remains consistent before and after the transaction.

  3. Isolation: Multiple transactions might be executed concurrently. Isolation ensures that each transaction is processed in isolation from others, meaning the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, one after the other.

  4. Durability: Once a transaction has been committed, it remains so. The changes persist even in the face of system failures.

Transaction Control with SQL Codes

  1. COMMIT: This command is used to save all the transactions to the database. Once you use the COMMIT command, you can't roll back the changes.

     COMMIT;
    
  2. ROLLBACK: If you decide not to save the transactions, you can roll back to the state of the database before starting the transaction.

     ROLLBACK;
    
  3. SAVEPOINT: This command creates points within a transaction to which you can later roll back, rather than rolling back the entire transaction.

     SAVEPOINT SavepointName;
    

    If you need to roll back to a certain savepoint, you can use:

     ROLLBACK TO SavepointName;
    
  4. SET TRANSACTION: This command is used to start a named transaction. Naming transactions can be useful for monitoring and controlling the transaction.

     SET TRANSACTION NAME TransactionName;
    

Application

Now there is a stronger understanding of RDBMS, let's apply this into our SNS application.

Creating Entity

From last week, we created an domain object called post:

package com.example.sns.order.domain;

import lombok.Getter;

import javax.persistence.*;
import java.time.LocalDate;
import java.util.Date;

@Entity
@Table(name = "post")
@Getter
public class Post  {

    @Id
    @GeneratedValue(strategy= GenerationType.AUTO)
    @Column(name="id")
    private Long id;

    @Column(name="user_id")
    private Long userId;

    @Column(name="title")
    private String title;

    @Column(name="content")
    private String content;

    @Column(name="created_at")
    private LocalDate createdAt;

    public Post() {
        super();
    }

    public Post(LocalDate localDate) {
        //transform localDate to Date
        this.createdAt = localDate;
    }

    public Post(Long userId, String title, String content) {
        this.userId = userId;
        this.title = title;
        this.content = content;
        this.createdAt = LocalDate.now();
    }

    public Post(Long id, Long userId, String title, String content, LocalDate createdAt) {
        super();
        this.id = id;
        this.userId = userId;
        this.title = title;
        this.content = content;
        this.createdAt = createdAt;
    }
}

Let's focus on the annotations used for this code:

  1. @Entity: This annotation indicates that the Post class is an entity and is mapped to a database table. An entity represents a table in a relational database.

  2. @Table(name = "post"): This specifies the name of the table in the database that this entity is mapped to. In this case, the table name is "post".

  3. @Id: This is used to specify the primary key of the entity.

  4. @GeneratedValue(strategy= GenerationType.AUTO): This annotation is used with the @Id annotation to specify how the primary key should be generated. The GenerationType.AUTO means the provider should pick an appropriate strategy for the particular database.

  5. @Column(name="..."): This annotation specifies that the annotated field is a column in the related database table. The name attribute allows you to specify the column name if it's different from the field name.

Using Entity to initialize database

With this set up, let's look at the application.yml, specifically, the application-dev.yml

spring:
  datasource:
    url: jdbc:h2:mem:testdb
    driver-class-name: org.h2.Driver
    username: sa
    password: password
  jpa:
    hibernate:
      dialect: org.hibernate.dialect.H2Dialect
      ddl-auto: update
    show-sql: true
    format-sql: true
    properties:
      hibernate:
        id.new_generator_mappings: false
        format_sql: true

Here, the most important part is the ddl-auto: update. What this allows is that Hibernate will automatically update the database schema whenever it detects changes in your entity mappings or object-relational configuration. In other words, if you make any modifications to your JPA entities, Hibernate will attempt to reflect these changes in the database schema the next time the application starts.

This is extremely useful during development since you won't have to manually alter the database schema every time you adjust your domain model. However, for production environments, it's advisable to set this to validate or none to avoid unintentional schema changes that could lead to data loss or other unwanted consequences.

Additionally, with show-sql: true, Hibernate will log all the SQL statements it generates, allowing developers to monitor and diagnose database operations conveniently. The format-sql: true ensures that the logged SQL statements are formatted for better readability.

H2-Console

Now, when you start your application, you can go to http://localhost:080/h2-console. After you login, you will see the following:

As shown in the image, the database is properly initialized.

Conclusion

This blog has provided an in-depth exploration of the Relational Database Management System (RDBMS), delving into its integral components and concepts such as tables, keys, indexes, and transactions, and the roles they play in maintaining data integrity, consistency, and reliability.

We’ve also learned about SQL, the pivotal language that interacts with relational databases, and ACID properties that ensure robust transaction processing. The application of these principles in real-world scenarios was exemplified through the implementation of entities in a Spring Boot application using Hibernate and JPA.

By using the described configurations and annotations, developers can achieve a seamless and dynamic interaction between their application and the database, resulting in efficient data management and retrieval, ensuring the development of robust and scalable applications.

In wrapping up, the journey through relational databases, SQL, entity-relationship, and the application of all these concepts in real-world scenarios is undeniably intense but essential for any developer aiming to master backend development. We hope this blog post has illuminated the crucial aspects of relational databases and provided you with a foundation to build sophisticated, scalable, and reliable applications. Until next time, keep exploring, keep learning, and don’t hesitate to dive deep into the vast ocean of software development!

Happy Coding!