70  Database Design from Spreadsheets

Spreadsheet usage in extensive in the businesses due to their versatility, accessibility, and ease of use. Many businesses collect and store data in spreadsheets initially.

See recent example of formula 1 excel usage. The formula 1 team used excel for their 20000 part inventory, like below image shows their system become very complex.

Note

See following youtube clip.

show me all excel files you are using since plants (factories) fill the gaps with the excel since their system cannot produce these reports. these reports show the gaps.

From my experience working with implementing ERP systems, reproducing spreadsheets in the database is very important. I also always ask if they use any excel files.

But problems with the spreadsheets are following:

  1. Starting with spreadsheets are easy but they may become very complex.
  2. Collaboration is easy with online versions of spreadsheets but role based security is not possible

Many businesses later decide to to migrate and manage this spreadsheet data in a relational database. Thus, transforming spreadsheets to databases is valuable across various roles, including requirement engineer, data analysis, database administration, and software development.

Transforming spreadsheets to relational database offers following benefits:

  1. Spreadsheets are often unstructured. Transforming them to normalized tables impose logical structure, making data analysis more easier.
  2. Database tables enforce integrity with their data types and constraints. This reduces error in data entry and calculations
  3. Data retrieval performance and scalability for multi users are always better. Especially for large data, relational databases are always better.
  4. Data Security and access control is better with databases
  5. SQL language offer very complex queries for reporting purposes
  6. Applications for mobile, web and desktop are more easier
  7. Dynamic relationships are more easier to handle. It is harder to connect multiple spreadsheets but it is more easier to connect multiple tables even multiple databases.
  8. Having one single source of truth is possible with databases while it is very hard to do so in spreadsheets. It is very common to have multiple versions of spreadsheets.

70.1 Examples

70.1.1 Student Advisors

Student Advisors
STU_NUM STU_NAME STU_MAJOR DEPT_CODE DEPT_NAME DEPT_PHONE COLLEGE_NAME ADVISOR_NAME ADVISOR_OFFICE ADVISOR_BLDG ADVISOR_PHONE STU_GPA STU_CLASS
211343 Ahmet Computer Enginering CENG Computer Enginering 4356 Enginering Ender T201 T Building 2115 3.87 Junior
200128 Smith Computer Eng CENG Computer Enginering 4356 Enginering Ender T201 T Building 2115 2.78 Sophomore
199876 Ayşe Marketing MKT Marketing 4378 Business Admin Atilla T228 T Building 2123 2.31 Senior
198648 Asya Marketing MKT Marketing 4378 Business Admin Fatih T356 T Building 2159 3.45 Senior
223456 Seda Statistics MATH Mathematics 3420 Sciences Hakan J331 A Building 3209 3.58 Junior

70.1.2 Office Equipment

A company is tracking its office equipment like

  • furniture
  • computers
  • printers
  • and other office equipment

Some sample values are given below.

Office Equipment
ITEM_ID ITEM_LABEL ROOM_NUMBER BUILDING_CODE BUILDING_NAME BUILDING_MANAGER
1 Desk 301 BLDG1 Main Building John Smith
2 Chair 301 BLDG1 Main Building John Smith
3 HP DeskJet 895Cse 205 BLDG2 Annex Jane Doe
4 Whiteboard 205 BLDG2 Annex Jane Doe
5 HP Pavilion Laptop 102 BLDG1 Main Building John Smith

Company would like to answer following questions in their reports

  • How many laptops are assigned to employees?
  • Do we have more than one printer in the office rooms?
  • Do we have at least one printer in every floor?

Please design a new database for their use cases at least in 3rd normal form.

70.1.3 Student Classes

Student Classes
Student_ID Name Class information Course_No Course name Grades
1 Ayşe 12A, Room 123, class teacher: Mesut 2 Programming 80
2 Duru 11C, Room 115, class teacher: Atilla 3 Databases 70
3 Aydın 11B, Room 210, class teacher: Ender 1, 2, 3 Web Apps, Programming, Databases 90, 60, 74
4 Ahmet 12A, Room 123, class teacher: Mesut 1, 2 Programming, Web Apps 67, 50
5 Funda 11C, Room 115, class teacher: Atilla 1 Web Apps 72

70.1.4 Developers and Projects

Developers and Projects
Personel Number FirstName LastName Title ProjectNumber ProjectRole Hours in Project
12345 John Doe Manager P001 Project Lead 40
54321 Jane Smith Analyst P001,P002 Data Analyst,Data Analyst 35,10
67890 David Lee Developer P001,P003 Developer,Data Analyst 30, 60
98765 Sarah Jones Tester P003 Tester 32
13579 Michael Brown Architect P002 Solution Architect 45
24680 Emily Davis Designer P001 UX Designer 38
15975 Daniel Miller Analyst P003 Business Analyst 30
35791 Olivia Wilson Developer P002 Senior Developer 40
57913 James Moore Tester P001,P002,P003 QA Lead,QA,QA Lead 35,10,20
79135 Chloe Taylor Architect P003 Enterprise Architect 50

70.1.5 Employee information

Employee Information
EMP_NUM EMP_FIRST_NAME EMP_LAST_NAME EMP_EDUCATION JOB_CLASS EMP_DEPENDENTS DEPT_CODE DEPT_NAME DEPT_MANAGER EMP_TITLE EMP_DOB EMP_HIRE_DATE EMP_SKILLS EMP_SALARY
1001 John Doe Bachelor’s Engineer Jane (spouse),Mary (daughter), Jill (daughter) IT Information Technology Jane Smith Software Engineer 1990-01-15 2020-05-10 Java, Python, SQL 75000
1002 Jane Smith Master’s Manager HR Human Resources David Lee HR Manager 1985-08-20 2018-11-22 Leadership, Communication 90000
1003 David Lee Bachelor’s Analyst Finance Finance Michael Brown Financial Analyst 1992-03-12 2021-07-05 Excel, Tableau 60000
1004 Sarah Jones PhD Scientist John (spouse) Research Research & Development Emily Davis Research Scientist 1988-11-05 2015-02-18 Biology, Chemistry 85000
1005 Michael Brown Master’s Manager George (son) Sales Sales & Marketing Daniel Miller Sales Manager 1980-05-25 2012-09-10 Sales, Negotiation 100000

70.1.6 Consultant

Consultants
CLIENT_NUM CLIENT_NAME CLIENT_CITY CONTRACT_DATE CONTRACT_NUMBER CONTRACT_AMOUNT CONSULT_CLASS_1 CONSULT_CLASS_2 CONSULT_CLASS_3 CONSULT_CLASS_4 CONSULTANT_NUM_1 CONSULTANT_NAME_1 CONSULTANT_CITY_1 CONSULTANT_NUM_2 CONSULTANT_NAME_2 CONSULTANT_CITY_2 CONSULTANT_NUM_3 CONSULTANT_NAME_3 CONSULTANT_CITY_3 CONSULTANT_NUM_4 CONSULTANT_NAME_4 CONSULTANT_CITY_4
C001 Acme Corp New York 2024-07-05 C12345 50000 Strategy Marketing Finance IT 101 John Doe New York 102 Jane Smith London 103 David Lee Tokyo 104 Sarah Jones Paris
C002 Global Tech London 2023-11-10 C23456 75000 Technology Operations Legal HR 105 Michael Brown Berlin 106 Emily Davis Sydney 107 Daniel Miller Rome 108 Olivia Wilson Madrid
C003 Innovate Inc Paris 2024-03-20 C34567 30000 Marketing Sales Finance Legal 109 James Moore Toronto 110 Chloe Taylor Dubai 111 Alex Lee Seoul 112 Ben Walker Mumbai
C004 Data Solutions Tokyo 2022-09-08 C45678 45000 Data Analysis AI Cloud Cybersecurity 113 Lisa Wong Shanghai 114 Mark Davis Singapore 115 Nancy Chen Beijing 116 Peter Jones Hong Kong
C005 Green Energy Sydney 2024-05-15 C56789 60000 Sustainability Renewable Energy Energy Storage Grid Modernization 117 Robert Smith New Delhi 118 Susan Lee Rio de Janeiro 119 Thomas Brown Cape Town 120 Victoria Jones Mexico City