Software product development has become increasingly complex, with data playing a critical role in software products’ success.
To gain insight into the role of a data architect in software product development, we interviewed Goran Skorput, Field CTO at Kanda, who has over 20 years of experience in the field.
Goran shared details about his career path in Big Data and Analytics, told us about one of the latest big data projects at Kanda, and discussed the importance of having a skilled and knowledgeable data architecture team behind any data-driven project.
Goran started his career at Hay Group, a consultancy acquired by Korn Ferry in 2015. At Hay Group, Goran and his teams productized leadership and talent consulting services to expand the company’s reach to wider management audiences through database-driven SaaS applications.
Goran then co-founded VoteTru with Fred Reichheld, an inventor of Net Promoter Score and the author of The Ultimate Question bestseller, where they built the HuddleUp platform for teams to continuously and anonymously share feedback on teams performances.
Goran joined Kanda Software in 2018 to further develop Kanda’s cloud, big data, and analytics business.
My role is multifaceted. In the outward-facing aspect of my role, I get to talk to prospects and clients to learn about the technical and business problems they are trying to solve. This usually happens during the sales process with prospects or during technical discovery calls with our clients.
This information allows me to translate customers’ needs into concrete requirements, which are then used by our talent team to select the best engineering talent and form teams to execute the job. In this role, I become a “voice of the customer” when the project is being executed to make sure the project goals are achieved.
And lastly, accumulated knowledge gathered over a myriad of daily client interactions leads to lessons on how to improve existing processes and approaches, where to focus our R&D, and which new technologies and novel processes Kanda engineers need to master to stay competitive in the fast-changing technology environment of today.
As you can see, I act as a technical advisor to both internal and external stakeholders. I provide guidance to solve technical problems and help the clients and Kanda management make informed decisions about their technology investments.
There were many, so let me choose one which required the involvement of multiple data architects and data engineers to execute.
The client is a large SaaS company that plays in the IoT Automotive industry. They have hundreds of thousands of IoT devices installed in trucks throughout the US, sending diagnostics and geolocation data to their Azure infrastructure 24×7.
The data backbone was powered by Microsoft SQL Servers, which served as application database servers. There was no clear strategy on how to archive older data stored in the application servers or how to store the data so it could be easily searched or aggregated for reporting purposes, no matter how old it was.
Kanda data architecture team was tasked with coming up with a strategy to offload some of the work from the existing application databases, create a central repository for all data, logically connect data from different applications in the central repository, expose the data to internal stakeholders and create security framework so the users can access only the data they are supposed to see, establish backup strategies and assist the client with maintaining business continuity plans.
One of the first hurdles the Kanda data team had to overcome was to break silos created by development groups managing a variety of web applications. Every web application was managed by a separate development group so the client did not have anyone who “knew all the data”. Kanda architects conducted extensive interviews with each development group and examined in detail each database server’s schemas, data itself, cataloged usage patterns, server loads, traffic types, security, backups, and reports.
Once the first phase was completed and an accurate picture of the existing siloed data platform was painted, we had enough information to suggest a couple of technology solutions. The client ultimately opted to use Microsoft Synapse lake house as a central data store. Data architects then created schemas that allowed all data from the transactional application database servers to be first exported in bulk and then exported incrementally daily as new data trickled in. The new schemas logically connected related data in one master lake house and made it available for fast querying and the PowerBI reporting system they chose for all their reporting needs.
The last big job was to replicate all reports implemented in a variety of legacy reporting systems used by different development groups, which ran against the data stored in the application database servers. Existing reports were replicated in PowerBI first, but this time, the data backend was the newly created central lake house. Since the data from all the applications were now stored in one place, was logically connected across different applications, was precalculated and aggregated, it allowed the client to easily create new reports giving them a view of the whole business, including longitudinal reports across a decade being in business, all in one reporting system.
Let’s define two terms related to a data architect’s job first; data architecture and data modeling.
Data architecture defines the blueprint for managing data assets by aligning with organizational strategy to establish strategic data requirements and designs to meet those requirements.
On the other hand, data modeling is “the process of discovering, analyzing, representing, and communicating data requirements in a precise form called the data model.”
While both data architecture and data modeling seek to bridge the gap between business goals and technology, data architecture is about the macro view that seeks to understand and support the relationships between an organization’s functions, technology, and data types. Data modeling takes a more focused view of specific systems or business cases. For simple database-driven applications, data models are important to consider.
As the complexity increases, data architecture becomes as important. Software development teams rely on data being modeled so it can be used by the application code in an optimal way, and it needs to be stored in a database system that will serve the data as fast as possible too.
The rest of the business will be interested in data being safe from loss in case of catastrophic failures, in data etymology, security, compliance, and budgets needed to maintain the data integrity.
Data architects, along with the rest of the development team, make sure that all these concerns are addressed.
This is a very wide topic which would require a separate interview to dig deep and explore fully so I will only outline my view here. I will focus on the whole data team as opposed to one single role within the team while I try to explain this.
Measuring effectiveness or ROI of data teams is not easy. The data teams sit between technology and business so we need to take both aspects of the team’s effectiveness into the account.
Here are some examples of business metrics data teams can measure, but every business will need to look at their own business, their unique needs and customize the list.
Defining technical KPIs is somewhat easier, but again, they depend a lot on the use case and the type of the organization the data team is part of. Data observability and data reliability can be used for this purpose. Data observability can be measured by 5 KPIs known as five pillars of data observability. They serve as key measurements for understanding the health of the data at each stage in its lifecycle.
Data reliability KPIs are best evaluated via SLAs (service level agreements), SLIs (service level indicators) and SLOs (service level objectives) which came from Google’s SRE playbook. Per Google, SLAs require clearly defined SLIs, quantitative measures of service quality, and agreed-upon SLOs, the target values or ranges of values that each indicator should meet. For example, many engineering teams measure availability as an indicator of site reliability, and set an objective to maintain availability at least 99%. Usually, for data teams, the process of creating reliability SLAs follows three key steps: defining, measuring, and tracking.
This topic is too extensive to go into for more details, but the information above can give a general perspective of the ways of measuring a data architect’s effectiveness.
Cloud computing, machine learning, artificial intelligence, blockchain, and the Internet of Things (IoT) — each of these technologies can potentially transform how businesses operate. Large language models like ChatGPT require enormous amounts of data to train the models and run inference quickly while supporting millions of users. Data architects and engineers supporting the growth of AI will need to be able to store and manipulate larger and larger data sets both “on disk” and in memory.
With the explosion of big data and the increasing importance of data-driven decision-making, businesses will need to rely more on data architects to build and manage data infrastructures that can scale to meet the demands of the modern business environment. It will require data architects to have a deep understanding of database technologies and business and to communicate effectively with both technical and non-technical stakeholders.
Hence, the data architect’s role will become even more crucial in the coming years.
It goes without saying that data architects nowadays should focus on building a solid foundation in both technology and business. It means developing a deep understanding of data architecture principles and best practices and staying up-to-date on emerging technologies and trends.
Developing strong communication and collaboration skills is also important since data architects often need to work closely with multiple teams, including development, business analytics, C-suite, and more — to deliver successful software products.
Goran Skorput and the team of data architects at Kanda Software play a critical role in delivering successful projects for clients by providing expertise in data management and integration. Our knowledge of different databases and understanding of compliance issues and client requirements help ensure that the right technology is selected for each project.
From large corporations to dynamic startups, Kanda Software is well-equipped to help companies of all sizes and industries achieve their software development goals.
Talk to the experts at Kanda to learn how they can help your company succeed today.