How much does prompting matter with LLMs?

This isn't a comprehensive guide. It's full of anecdotal information gleaned from 100s of experiments with LLMs across projects such as real-time news generation, stock trading to marketing and health and fitness.

The short answer: it does and it doesn't.

Why's that?

It's because the company's behind the foundation models, particularly for their consumer facing chat products are focussed on making prompting not matter. With their own internal prompts and layers of filtering are trying to make the products as accessible as possible. Whilst this might not have been true at the beginning, the level of convergence between the platforms makes it much more true now. This is one of the many reasons for the general outcry for uncensored and open source models, but there's other issues with that which perhaps is beyond the scope of this short note.

So what does matter?

1. Garbage in and garbage out

Remembering always that whilst these are sophisticated systems, they are primarily driven at least today (Sep 2024) in predicting the next word. And that's impressive, given the speed and scale at which they do that. But that's what they do.

That considered, prompting matters in creating the right context to give output that is cogent, reliable and credible. If you start with 'garbage' or an overly simplistic prompt, you are unlikely to get a deeply meaningful answer. As in life perhaps.

2. Expertise matters

If you know your subject matter, the rich texture of your context is likely to result in a better answer that serves your purpose. If you have no understanding of a news event and are aiming to summarise that article then it's highly likely that the output will be simplistic and of broad relevance. Not dissimilar to if you ask someone to summarise a medical textbook and they have little understanding of the subject matter. Again, context mattes.

3. Examples Matter

If you're able to give examples of what you're looking for, the chances are much higher that your prompt will result in something much closer to what you're looking for. And let's not forget that everyone is looking for something slightly different as a result, which leads to highly subjective assessments of capability. And the world of evals whilst important, is arguably no different. It's a striving for an objective measure, but again perhaps irrelevant for this post.

Examples allow LLMs to do what they tend to do best, which is apply information in a lateral way. Examples constrain the context and LLMs seem to product much more credible information as a result. Note this isn't always possible, for example if you don't know exactly what you're looking for.

4. Reasoning matters

Chain of thought works. At least today. It will help the majority of people to get a more credible response. What does that mean? It means building the reason into your input and output, to ensure that the LLM is progressively building it's answer and therefore constraining it's output over it's response. Again, this reduces the garbage in, leading to a much higher chance of better output.

5. Repeated or recursive prompting

Asking an LLM the same question again and again with the same context and simply asking for it to 'do it again' is a folly. It won't lead to better results, will frustrate the user and the system will endlessly go round in circles. If you find the output stuck, shift up the context, significantly, if you can.

6. Dynamic Context and dynamic prompting

If you are faced with the challenge of prompting in a dynamic way, where you're feeding the prompt in real-time, it's vital that again you build up the prompt in a manner that allows the output to progressively build up. Otherwise you're back to the hit and hope strategy that many people deploy and then dismiss the usefulness of LLMs.

There's a lot more to the topic of successful prompting, but in some ways it matters less than it did and less than it will. Yes spelling mistakes and lazy writing or when the prompter lacks domain specialism can be problematic, you can use the LLM to prompt itself. You can use the LLM to dynamically prompt itself. There's a plethora of techniques to ensure that the output you get is accurate and creative.

But it starts from following simple rules:

Context matters: get the context right to ensure the quality of the output
Domain specialism matters: the more you know about a topic the more likely you can effectively direct and judge the output and therefore iterate your prompt
Don't let loose without guardrails: it's not dangerous, it's just stupid, and you'll end up with answers that will dent your confidence in the ability of the LLM to respond effectively.

Lots more for another post.

Happy prompting!