How Giant Language Fashions (LLMs) Assist Generate New Concepts |

0
How Giant Language Fashions (LLMs) Assist Generate New Concepts |


As with each rising general-purpose know-how, Generative AI (GenAI) is looking for issues to resolve. Discovering essentially the most becoming will take time. I take into account it pointless to search for the issues that GenAI can’t do; as a substitute, I choose specializing in what it already can.

One of many few areas the place GenAI has already demonstrated its usefulness is innovation. In a latest PPT presentation, “Powering Entrance-Finish Innovation with AI/LLM Instruments,” I explored how AI can enrich the front-end of the innovation course of. On this article, I’ll overview educational literature describing the appliance of LLM algorithms to 1 particular stage of this course of: producing new concepts. 

Meincke et al. (2023) look like the primary to make use of an LLM algorithm to generate new product concepts. The authors took benefit of a pool of concepts created by MBA college students enrolled in a course on product design in 2021 (that’s, earlier than the huge availability of LLMs). The scholars got the next immediate:

“You’re a artistic entrepreneur seeking to generate new product concepts. The

product will goal school college students in the US. It ought to be a bodily good, not a service or software program. I’d like a product that may very well be bought at a retail worth of lower than about USD 50…The product needn’t but exist, nor could it essentially be clearly possible.”

200 concepts generated by the scholars have been used as a benchmark to match with two swimming pools of concepts generated by OpenAI’s ChatGPT-4 with the identical immediate. One set comprised 100 concepts generated by ChatGPT with minimal steerage (zero-shot prompting); the opposite 100 concepts generated by the mannequin after offering it with a couple of examples of high-quality concepts (few-shot prompting).

The primary vital discovery made by Meincke et. al. was that ChatGPT was producing new product concepts with outstanding effectivity. It took one human interacting with the mannequin solely quarter-hour to give you 200 concepts; a human working alone generated simply 5.

This dramatically reduces the price of new concepts generated by ChatGPT. Underneath particular circumstances described within the article, producing one ChatGPT concept prices $0.65 in comparison with $25 for an concept generated by a human working alone. Meaning a human utilizing ChatGPT generates new product concepts about 40 instances extra effectively than a human working alone.

Quicker and cheaper. However what concerning the high quality of the concepts?

To evaluate the standard of all 400 concepts, the acquisition intent measurement by a client survey was utilized. Measured this fashion, the typical high quality of concepts generated by ChatGPT is statistically larger than those generated by people: 47% for ChatGPT with zero-shot prompting and 49% with few-shot prompting vs. 40% for human-generated concepts.

Furthermore, among the many 40 top-quality concepts (high decile of all 400), 35(!) have been generated by ChatGPT. 

The one comfort for us people was that the imply novelty of human-generated concepts was larger than those generated by the mannequin: 41% vs. 36%. Moreover, ChatGPT-generated concepts, particularly with few-shot prompting, exhibited larger overlap, limiting their range in comparison with human concepts. Sadly, the novelty itself didn’t have an effect on buy intent.

In a follow-up research, Meincke et al. got down to enhance the range of ChatGPT-generated concepts by utilizing 35 totally different prompting strategies. The authors used the identical framework as within the earlier research: looking for concepts for brand new client merchandise focused to school college students that may be bought for $50 or much less.

Meincke et al. present that of all 35 prompting approaches, Chain of Thought (CoT) prompting, which asks the LLM to work in a number of, distinct steps, resulted in essentially the most numerous pool of concepts; its range approached the extent of the concepts generated by the scholars.

The authors additionally confirmed a comparatively low overlap between concepts generated utilizing totally different immediate strategies. That signifies that a “hybrid” method—utilizing totally different prompting strategies after which pooling the concepts collectively—is likely to be a promising technique for producing massive units of high-quality and numerous concepts.

One of many limitations of the above two research was that human-generated concepts have been created by college students. One may argue that college students, being much less skilled, couldn’t give you higher-quality concepts that may beat the algorithm. 

This limitation was addressed by the research of Joosten et al (2024). On this research, skilled designers and ChatGPT-3.5 have been assigned similar duties of producing novel concepts for a European provider of extremely specialised packaging options. A complete of 95 concepts have been generated, 43 by people and 52 by ChatGPT. All of the options have been evaluated, in a blind vogue, by the corporate’s managing director, a seasoned innovation professional.

The outcomes present that when assessed by the general high quality rating, ChatGPT generated higher concepts than professionals. Extra particularly, ChatGPT-generated concepts scored considerably larger than people’ in perceived buyer profit, whereas each units scored virtually identically in feasibility.

Curiously sufficient—and in distinction to the outcomes of Meincke et al.—ChatGPT-generated concepts scored considerably larger in novelty. In consequence, ChatGPT produced extra top-performing concepts when it comes to novelty and buyer profit.

Comparable outcomes have been obtained by Castelo et. al (2024). These authors in contrast concepts for a brand new smartphone utility that have been generated by GPT4 {and professional} app designers. The authors confirmed that GPT4-generated concepts have been ranked as extra authentic, modern, and helpful.

Moreover, Castelo et al. used a textual content evaluation method to find out what particularly made GPT4-generated concepts superior. To take action, they in contrast two kinds of creativity—creativity in type (when the language used to explain an concept is extra uncommon or distinctive) and creativity in substance (when the concept itself is extra novel)—and located that GPT4 outperformed people in each kinds of creativity.

Complementing the above two research is the work by Si et al. (2024) who analyzed the flexibility of Claude 3.5 Sonnet to generate analysis concepts (within the discipline of Pure Language Processing), reasonably than new product concepts. Evaluating concepts generated by the LLM mannequin with these generated by skilled NLP researchers, the authors confirmed that the LLM-generated output was ranked as extra novel, though barely much less possible, than the one generated by human specialists.

Of all identified idea-generation strategies, crowdsourcing is taken into account one of the efficient, a constant supply of concepts whose novelty, high quality, and variety exceed these created by people and small teams (of specialists and laypeople alike). One, due to this fact, might hope that at the very least a crowd of individuals would beat an LLM algorithm in an idea-generating competitors. 

Alas. 

Boussioux et al. (2024) designed crowdsourcing content material to generate round economic system enterprise concepts. In whole, 234 concepts have been generated (and evaluated by 300 impartial human judges): 54 by a human crowd of artistic drawback solvers and 180 by GPT-4. 

Certainly, options proposed by the human crowd exhibited the next degree of novelty, each on common and on the higher finish of the ranking distribution. But, GPT-4 scored larger within the concepts’ strategic viability for profitable implementation, in addition to environmental and monetary worth. Total, the options generated by the algorithm have been rated larger in high quality than the crowd-generated options.

Confirming findings by Meincke et al. (2024), Boussioux et al. discovered that CoT prompting resulted within the enhanced novelty of GPT-4-generated options with out compromising their general high quality.

As soon as once more, the authors demonstrated the excessive cost-efficiency of the LLM-assisted idea-generation course of: below particular circumstances utilized by the authors, it took 2,520 hours and $2,555 to generate 54 “human” options; the identical numbers for LLM-generated options have been 5.5 hours and $27. 

As lately as a couple of years in the past, the standard knowledge was that AI instruments would solely be used to automate routine data work however that the artistic a part of this work would stay within the human area. Current developments forcefully disprove this discourse. 

One can break up proverbial hairs whereas assessing the novelty or feasibility of concepts generated by LLMs. However one factor is obvious: the general high quality of LLM-generated concepts is at the very least as excessive because the one generated by us people. And all that is solely at a fraction of the time and price of human ideation.

That signifies that in silico ideation is right here to remain, which permits companies to shift their consideration from the ideation stage of the innovation course of to later levels, corresponding to concept incubation and prototyping.

A minimum of till LLMs present us they’re higher at these levels too.