The real potential of GenAI stems from its versatility in augmenting diverse data processing tasks. Beyond making the conventional processes more efficient, it offers a myriad of possibilities in developing innovative applications. These range from automated generation of data stories and creating synthetic datasets for model training.
In this article, we delve into two diverse applications of GenAI that exemplify its flexibility, adaptability and resourcefulness. Dataset augmentation addresses the data availability problem for ML model training by generating synthetic data, similar or better than the actual. Next, we look at GenAI being applied to enable conversational data interactions, eliminating the need of data and analytics expertise by providing an intuitive and natural language interface for data exploration. While the first application assists data professionals, the second one benefits business users. Augmentation readies the data and the model, while an intuitive data and analytics (D&A) interface unveils its real value, providing business leaders with actionable insights they need for strategic decision making. Here is a detailed look at each:
Dataset Augmentation
The outcome of models used for predictions, scenario analysis or time-series forecasting are highly dependent on the completeness and quality of the datasets that were used to train them. Training data, however, is usually inhibited by the volume, cost, privacy and access control of collected real-life data. The outcomes are also impacted by any bias and imbalance that affects the data quality.
Models built with such ambiguous and constrained datasets may struggle to discern meaningful patterns and correlations, resulting in unreliable predictions. They are susceptible to overfitting or underfitting, with limited generalization capabilities. If data points or variables are missing or underrepresented, machine learning model predictions miss the real-world environment, leading to potential errors in decision-making. When training data has inherent biases, they are perpetuated and amplified in the insights generated by the models built with it.
Dataset augmentation using generative AI has emerged as a solution to mitigate these challenges and enhance the efficacy of ML models. Techniques like generative adversarial networks (GANs), variational autoencoders (VAEs) or deep neural networks (DANs) are used to synthesize data and augment the actual datasets. Generated artificially, this synthetic data supplements and complements real data, adding the missing components or expanding its scope and diversity. It can closely mimic accumulated data, anonymize it and introduce variation and complexities that remove biases to improve its wholeness. Any class imbalances and biases present can be mitigated by adding synthesized data that stabilizes the set with underrepresented or marginalized categories. ML models, when trained with this augmented data, promote fairness and inclusivity in its output.
Data augmentation finds use in several areas like fraud detection in financial services or in the healthcare sector for diagnosis and medication recommendations. Patient privacy concerns, data scarcity of rare conditions or a skew towards certain demographics lead to deficient model training in medical diagnosis applications, resulting in poor prediction accuracy. When past case data is augmented, for example with diverse and representative medical condition images, diagnostic algorithms become more accurate.
Improved datasets containing both collected data and synthetic data contribute to the development of more robust ML models trained to account real-world complexities. Gartner expects synthetic data to be widely used for model development by 2030, foreseeing “the most valuable data will be the data we create, not the data we collect.”
Conversational Data Interactions
Even though data storage, retrieval and compute infra has improved significantly, enterprises still face a challenge in deriving meaningful insights from the vast amounts of data that they accumulate. Querying and exploring the data remains a substantial hurdle, particularly for the business users, with even the drag-and-drop features not being intuitive and aligned enough with the human thought processes. For example, finding if customer tenure has a significant impact on retail sales contribution may not be easy with drag and drop interfaces as it necessitates multiple, complex and intricate data operations.
To meet this challenge, GenAI with natural language querying (NLQ) holds an immense promise for providing a user-friendly, conversational and intuitive interface for data interactions, especially for business users.
NLQ allows business users to interact with data using everyday language, eliminating the need for technical expertise in querying databases or understanding complex data structures. People in functional roles like marketing, finance or HR can pose questions and requests to their data sets without knowing SQL or the underlying data syntax and structure. This democratizes access to information and advanced analytics, reducing the dependency on specialized IT or data professionals. Though NLQ has been around for over two decades, combining it with GenAI changes its basis from using statistical models to LLMs. This enables precise query generation, even for intricate business needs.
A conversational querying method powered by GenAI further empowers users to gradually delve into data and adjust their queries as they gain insights along the way. Users can ask follow-up questions, drill down into specific subsets of data and uncover hidden patterns or correlations, supporting deeper insights and a more nuanced understanding of business drivers. This exploratory approach to data insights is extremely powerful in the hands of business leaders for acting upon business challenges and opportunities. Moreover, ad-hoc analysis capabilities help in early identification of emerging trends, leading to quicker and better-informed responses.
Though NLQ and GenAI are capable of creating intricate and complex queries, sometimes the business demands may require multiple queries which are often implemented with stored procedures. Here, too, GenAI code generation capabilities assist the technology team in automating tasks and reducing delays in providing reports.
Closing Thoughts
Gartner has noted that Generative AI shall be a general-purpose technology, similar to the internet or electricity. Its value would lie in how it is used. The two innovative applications in D&A cited here illustrate how it impacts and augments two very diverse areas – the first for data scientists to perfect the very models that are used to turn data into insights. The second allowing business leaders to access and explore these insights intuitively using natural conversational language. Both applications together empower organizations to tell data stories that impact decision making.
Views expressed by Mr. Dharmendra Chouhan, Director of Engineering – Kyvos Insights