You’ve got to know your limitations.
The same could be said about Artificial Intelligence (AI).
In today’s column, I am going to take a close look at two vital weaknesses associated with a type of AI known as Generative AI and will examine these particularly oft decreed concerns via the use of a generative AI app by the name of ChatGPT.
In addition, I’ll share with you some leading-edge research being done in my AI Lab that seeks to mitigate or somewhat overcome the vociferously decried keystone weaknesses. The overarching notion behind the mitigating approaches involves using AI to help bolster other AI, and I’ll be describing too how this is being particularly done in the legal domain by using generative AI for legal-oriented tasks (note that the devised approach works in other domains too, being ostensibly generalizable).
I think you’ll enjoy this handy exploration since I will be indicating various inputs and outputs to show you ChatGPT and do so by covering the seemingly mind-stretching task of answering those brainy bar exam questions that lawyers need to successfully pass. This will also provide an opportunity to say a bit about how Generative AI is entering the legal profession. Some fervently argue that the advent of generative AI is going to be a game changer for lawyers and the act of lawyering. Generative AI is said to be a looming disrupter to the practice of law. We’ll take a keen look at those claims and pronouncements.
One other thing to mention.
All of this discussion about AI has vital significance related to AI Ethics and laws associated with AI (typically referred to as AI Law). You might find of interest my extensive and ongoing coverage of the latest in AI Ethics and AI Law at the link hereand the link here, just to name a few. In terms of how the law and AI intertwine, please be aware that new laws are being drafted that aim to corral AI that goes off-the-rails (typically phrased as the law-applied-to-AI), and there is the other side of the coin entailing using AI to undertake legal chores (usually phrased as AI-applied-to-law). I’ll be addressing both matters herein.
MORE FROMFORBES ADVISOR
Before we get into the meaty aspects of the two posited weaknesses of Generative AI, it might be useful to make sure that we are all on the same page on this rapidly evolving AI topic. You see, a specific type of AI known as Generative AIhas dominated social media and the news recently when it comes to talking about where AI is and where it might be headed. This was sparked by the release of an AI app that employs generative AI, the ChatGPT app developed by the organization OpenAI.
ChatGPT is a general-purpose AI interactive system, essentially a seemingly innocuous general chatbot, nonetheless, it is actively and avidly being used by people in ways that are catching many entirely off-guard. For example, a prominent worry is that ChatGPT and other similar generative AI apps will allow students to cheat on their written essays, perhaps even encouraging or spurring pupils to do so. Students that are lazy or feel they are boxed in without time or skill to do an essay might readily invoke a generative AI app to write their essay for them. This has stirred quite a hornet’s nest.
If you are interested in the rapidly expanding brouhaha about ChatGPT and generative AI, I’ve been doing a focused series in my column that you might find informative and engaging. Here’s a glance in case any of these topics catch your fancy:
- Generative AI and Mental Health Advice. I opted to review how generative AI and ChatGPT are being used for mental health advice, a troublesome trend, per my focused analysis at the link here.
- Predictions Of Generative AI Advances Coming. If you want to know what is likely to unfold about AI throughout 2023, including upcoming advances in generative AI and ChatGPT, you’ll want to read my comprehensive list of 2023 predictions at the link here.
- Context And Generative AI Use. I also did a seasonally flavored tongue-in-cheek examination pertaining to a Santa-related context involving ChatGPT and generative AI at the link here.
- Scammers Using Generative AI. On an ominous note, some scammers have figured out how to use generative AI and ChatGPT to do wrongdoing, including generating scam emails and even producing programming code for malware, see my analysis at the link here.
- Rookie Mistakes Using Generative AI. Many people are both overshooting and surprisingly undershooting what generative AI and ChatGPT can do, so I looked especially at the undershooting that AI rookies tend to make, see the discussion at the link here.
If you’ve not yet read about Generative AI and ChatGPT, no worries as I’ll be explaining the foundations herein so hang in there and you’ll get the general scoop.
Perhaps you’ve already seen those outsized claims on social media about these latest AI apps. Some suggest that this generative AI is in fact sentient AI (nope, they are wrong!). Those in AI Ethics and AI Law are notably worried about this burgeoning trend of outstretched claims. You might politely say that some people are overstating what today’s AI can actually do. They assume that AI has capabilities that we haven’t yet been able to achieve. That’s unfortunate. Worse still, they can allow themselves and others to get into dire situations because of an assumption that the AI will be sentient or human-like in being able to take action.
Do not anthropomorphize AI.
Doing so will get you caught in a sticky and dour reliance trap of expecting the AI to do things it is unable to perform. With that being said, the latest in generative AI is relatively impressive for what it can do. Be aware though that there are significant limitations that you ought to continually keep in mind when using any generative AI app.
Let’s cover the essence of generative AI and ChatGPT. We’ll then be ready to jump into the two major weaknesses that I’m calling out, along with some mitigating solutions and a demonstration via use in a legal domain.
Opening The Can Of Worms On Generative AI
We are ready to dive into AI.
Of the various types of AI, we will focus herein specifically on Generative AI. If you are already highly versed on this topic, you might opt to briefly skim through my points and continue with the next section of this discussion. For everyone else, I believe you will find this content quite newly informative and fascinating.
In brief, generative AI is a particular type of AI that composes text as though the text was written by the human hand and mind. All you need to do is enter a prompt, such as a sentence like “Tell me about Abraham Lincoln” and generative AI will provide you with an essay about Lincoln. This is commonly classified as generative AI that performs text-to-text or some prefer to call it text-to-essay output. You might have heard about other modes of generative AI, such as text-to-art and text-to-video.
Your first thought might be that this does not seem like such a big deal in terms of producing essays. You can easily do an online search of the Internet and readily find tons and tons of essays about President Lincoln.
The kicker in the case of generative AI is that the generated essay is relatively unique and provides an original composition rather than a copycat. If you were to try and find the AI-produced essay online someplace, you would be unlikely to discover it.
Generative AI is pre-trained and makes use of a complex mathematical and computational formulation that has been set up by examining patterns in written words and stories across the web. As a result of examining thousands and millions of written passages, the AI can spew out new essays and stories that are a mishmash of what was found. By adding in various probabilistic functionality, the resulting text is pretty much unique in comparison to what has been used in the training set.
That’s why there has been an uproar about students being able to cheat when writing essays outside of the classroom. A teacher cannot merely take the essay that deceitful students assert is their own writing and seek to find out whether it was copied from some other online source. Overall, there won’t be any definitive preexisting essay online that fits the AI-generated essay. All told, the teacher will have to begrudgingly accept that the student wrote the essay as an original piece of work.
In a moment, I’ll showcase to you what happens when you enter questions or prompts into generative AI. I will make use of the latest version of ChatGPT to enter my prompts and have collected the “answers” or essays generated by the AI (note that the same can be done with the numerous other available generative AI apps; I’ve opted to use ChatGPT because it is getting its five minutes of fame right now).
Perhaps a short tangent about ChatGPT might be helpful at this juncture.
ChatGPT app was made available to the general public just a short while ago. By and large, these generative AI apps are usually only accessible to AI insiders. The unusual facet that ChatGPT could be used by anyone by simply entering an email address and a name, well, this led to a lot of people deciding to give it a try. ChatGPT is currently free to use (the monetization issue is a looming dilemma for AI makers).
Almost immediately there was a humongous reaction on social media as people raced to give examples of what generative AI can do. The company that makes ChatGPT, OpenAI, opted to close off the signups at a million users. Those million users have managed to bombard the airwaves with all manner of stories and tales about using ChatGPT.
Be very careful in believing what people have to say about the AI app. Many of these people are clueless about what they are using. It is almost as though they had never driven a car and didn’t even realize cars existed, and all of a sudden they had a chance to drive a car. Utter amazement ensues.
I’m not saying that generative AI isn’t relatively impressive. It is. I am just emphasizing that a lot of the gushing testimonials are being done by many that are blissfully unaware of what today’s AI can do. Those of us on the inside of AI have been using generative AI for the last several years. Perhaps we became used to it.
Suddenly, seeing a huge crush of people touting it to the rooftops has been excitedly energizing, but also somewhat disconcerting. The disconcerting part is when people proclaim that generative AI is sentient. It is not. Do not let anyone convince you otherwise.
That being said, there is an ongoing heated debate in the AI field as to whether generative AI is on the path to sentience or whether maybe it is not. One view is that if we keep scaling up generative AI with faster computers and a greater amount of data such as scouring every inch of the Internet, we will nearly spontaneously arrive at sentient AI. Others argue that this is highly unlikely. They suggest that generative AI might be one of many components that are needed. There is even the gloomier view that generative AI is a sideshow that is distracting us from the real breakthroughs that we will need to achieve sentient AI.
You might also find noteworthiness that AI insiders tend to refer to Artificial General Intelligence (AGI) as the aspirational goal for the AI field. It used to be that the goal was to attain Artificial Intelligence, but the AI moniker has become watered down and muddled. When someone says they are doing AI work, you don’t know whether they are alluding to today’s AI that isn’t on par with humans or whether they are referring to a futuristic human equivalency AI. To get around that exasperating confusion, the newer phrasing of AGI is being used these days.
All told, the generative AI of today is not sentient, nor is it AGI.
I trust that this gets you into the ballpark about generative AI and particularly ChatGPT.
I will go ahead and show you a series of prompts and the corresponding responses that I got from ChatGPT. I’ll discuss each one as we go along. You can judge for yourself what you think of the AI-generated responses.
Please remember that as earlier discussed, the AI is not sentient. The generated responses by the AI are a mathematical and computational combination of words into seemingly fluent passages. This is based on the AI algorithm having been trained on datasets of words and stories that humans have written (principally as posted on the Internet). I repeat this warning because you will undoubtedly fall into the mental trap that these responses are so fluent that the AI must be sentient. This happens to most people. As earlier urged, set aside that anthropomorphizing. Always remember that the responses are based on the vast trove of writing by humans that exists on the Internet and thusly will highly resemble human writing.
There is something else you need to know.
Generative AI that is trained on the Internet in an unfettered way will tend to bake into whatever text-based responses it mathematically and computationally concocts some offensively hazy stuff, including repulsively nasty wording. There is a lot of crazy and filthy stuff posted out there on the web.
You’ve seen it, you know what I mean.
The companies that are crafting these AI apps are worried that the proverbial baby will get tossed out with the bathwater (an old saying, perhaps to be retired), which means that if their AI produces offensive essays or stories, people will go up in arms about the AI. I’ve covered the many previous instances in which these kinds of Natural Language Processing (NLP) AI apps were unveiled and soon enough all manner of horrible stuff came out of them. Most of the AI makers learned a hard lesson about allowing their AI wares to be unfettered in their outputs.
In the case of ChatGPT, the AI developers sought to put into place some algorithmic and data-related checks and balances to curb nastiness in the outputs of the AI. Part of this occurred during training time. In addition, there are other means in a real-time attempt to obviate especially egregious outputs.
You might find of interest that some people that have used ChatGPT already came up with surreptitious ways to get around those guardrails by making use of various trickery. An ongoing cat-and-mouse gambit takes place in these matters. Those that do these trickeries are sometimes doing so for the fun of it, while sometimes they (at least claim) they are doing so to see how far the AI can be stretched and provide a helpful means of forewarning the brittleness and weaknesses of these budding AI apps.
I decided to not attempt to circumvent the customary controls in this focused exploration. The text output is clean. Certainly, if one wanted to do so, you could undoubtedly get some oddball and unsavory essays to be generated.
The essays produced by most of these generative AI apps are designed to convey the output as though it is purely factual and accurate. When you read the produced essays, they come across as fully confident. There isn’t usually any kind of indication that the content might be rocky. This is by choice of the AI makers, namely that they could revise the AI apps to be more transparent if they wanted the AI app to do so.
Sometimes, a generative AI app picks up falsehoods amid the training data of unreliable info across the Internet. There is no “common sense” in generative AI to determine what is true versus false. Furthermore, very few AI apps have any cross-checking, and nor do they showcase any probabilities associated with what they are conveying.
The bottom-line result is that you get a response that looks and feels like it exudes great assurance and must be entirely correct. Not so. There is even a chance that the AI computationally made-up stuff, which in AI parlance is referred to as AI hallucinations (a coined term that I decidedly don’t like), see my discussion at the link here.
The makers of ChatGPT underwent a concerted effort to try and reduce the bad stuff outputs. For example, they used a variant of what is known as RLHF (Reinforcement Learning from Human Feedback), whereby before they released the AI to the public, they had hired humans to examine various outputs and indicate to the AI whether there were things wrong with those outputs such as perhaps showcasing biases, foul words, and the like. By providing this feedback, the AI app was able to adjust computationally and mathematically toward reducing the emitting of such content. Note that this isn’t a guaranteed ironclad method and there are still ways that such content can be emitted by the AI app.
You might find of interest that ChatGPT is based on a version of a predecessor AI app known as GPT-3. ChatGPT is considered to be a slightly next step, referred to as GPT-3.5. It is anticipated that GPT-4 will likely be released in the Spring of 2023. Presumably, GPT-4 is going to be an impressive step forward in terms of being able to produce seemingly even more fluent essays, going deeper, and being an awe-inspiring marvel as to the compositions that it can produce.
You can expect to see a new round of expressed wonderment when springtime comes along and the latest in generative AI is released.
I bring this up because there is another angle to keep in mind, consisting of a potential Achilles heel to these better and bigger generative AI apps. If any AI vendor makes available a generative AI app that frothily spews out foulness, this could dash the hopes of those AI makers. A societal spillover can cause all generative AI to get a serious black eye. People will undoubtedly get quite upset at foul outputs, which have happened many times already and led to boisterous societal condemnation backlashes toward AI.
One final forewarning for now.
Whatever you see or read in a generative AI response thatseems to be conveyed as purely factual (dates, places, people, etc.), make sure to remain skeptical and be willing to double-check what you see.
Yes, dates can be concocted, places can be made up, and elements that we usually expect to be above reproach are allsubject to suspicions. Do not believe what you read and keep a skeptical eye when examining any generative AI essays or outputs. If a generative AI app tells you that Abraham Lincoln flew around the country in his own private jet, you would undoubtedly know that this is malarky. Unfortunately, some people might not discern that jets weren’t around in his day, or they might know but fail to notice that the essay makes this bold and outrageously false claim.
A strong dose of healthy skepticism and a persistent mindset of disbelief will be your best asset when using generative AI.
We are ready to move into the next stage of this elucidation.
Generative AI And Two Very Troubling Problems
Consider the use of generative AI as consisting of your providing a set of inputs and the AI app providing a set of outputs. You enter a smattering of text as a prompt for the AI app. The AI app then produces or generates a text result for you, essentially handing you an essay of a kind. I’ll for convenience’s sake refer to this as Generative AI that does text-to-essay.
We have this simple sequence going on:
- 1) Input: You enter your prompt as text
- 2) Computational Processing: Generative AI app examines the input and then generates a response
- 3) Output: You see the generated result or essay from the AI app
You need to be quite mindful when composing your input text.
I say this because whatever text you provide as your prompt will substantively determine what sort of output essay the AI app will generate. People are surprised to discover that just a few words changed in a prompt can produce an at times materially different output essay. The wording changes can be of amazing minuscule variations, such as flipping words around in a sentence or opting to use a synonym for a word.
This has given rise to a new subfield that some are coining as prompt design or prompt engineering.
The idea is that you might take a training course on how to best compose prompts for use with generative AI. Alternatively, there are some that say a new profession is being born before our very eyes, namely that there will be specialists versed in how to optimally create prompts. You might hire such a person to aid you when using generative AI.
Part of the difficulty with devising suitable prompts is that even if you word-for-word reenter the same exact prompt into a generative AI app, you might nonetheless still get a different essay outputted. This is partially due to how the generative AI app is set up. When the AI app is in the computational process of generating the essay, it is sometimes using randomness to select word choices. If the AI app has perhaps five different words that can be next placed into a sentence, it might randomly select among them.
In the end, this can be handy since the essay generated will be seemingly unique and original. The downside is that just like a box of chocolates, you never know precisely what you will get. Many people are used to the ironclad precept that if you enter X and expect to get Y, you can do this repeatedly with a computer and will always get that good old reliable Y. In the case of generative AI, your entry of X might get you Y on the first try, then Z on the second try, Q on the third try, and so on. Depending upon the topic and wording, these outputted essay variants might be nearly identical or can be radically different.
Our first problem then is this: Devising suitable prompts for generative AI can be dicey and potentially require skillful mastery of what you want to get out of the AI app and how the AI app is likely to respond to your prompt.
That’s the first problem, but there is also a second problem that we need to get onto the table.
I had mentioned earlier that the output produced by the generative AI might be problematic due to containing falsehoods. These can be outright obvious errors. Lamentably, there can be subtle errors that are difficult to readily discern, and you might inadvertently overlook them. There can be “AI hallucinations” that are entirely made up and have no factual basis. All of this is worsened by the AI app conveying the response as though the generated essay is on the full up-and-up. You aren’t necessarily given any alerts or clues that something might be amiss (sometimes you are, while sometimes you are not).
Our second problem is this: Scrutinizing the outputted essays of generative AI can be problematic since you might not readily know or be able to easily discover falsehoods and other textual maladies embedded therein.
I trust that you can see that we have therefore a reasoned concern about the nature of the inputs into a generative AI app and a likewise reasoned concern about the nature of the outputs generated. It is a twofer. I could address each issue separately, which some do, but I find it useful to tackle both at the same time.
Boldness favors the brave.
Let’s revisit the noted sequence and include the weaknesses:
- 1) Input: Big problem — Devising suitable prompts for generative AI can be dicey and potentially require skillful mastery of what you want to get out of the AI app and how the AI app is likely to respond to your prompt.
- 2) Computational Processing: Generative AI app examines the input and then generates a response
- 3) Output: Big problem — Scrutinizing the outputted essays of generative AI can be problematic since you might not readily know or be able to easily discover falsehoods and other textual maladies embedded therein.
How are we to deal with these two vexing issues?
As mentioned, some believe that we need to train people on how to do better prompts or have them hire someone that knows how to do so. That’s the input-oriented proclaimed solution.
In terms of the output-related problem, the usual retort is that people get whatever they deserve. If you take at face value the output from an AI app, you are said to be a fool. Foolishness is hard to overcome. Some suggest that we ought to urge people to carefully examine the AI app outputs that they get. You could put this into the same category of qualms about mindlessly believing whatever you see on the Internet. As a society, we need to bolster our literacy about misinformation and disinformation, enabling people to not get snookered by informational trickeries, including when generated by a generative AI app.
I dare say those are somewhat wanting solutions if one can even give those seeming credit for being realistically viable widespread solutions to this particular twofer, consider these difficulties involved:
- Inputs-Solving Claims: Trying to get everyone trained on prompt-related sneaks is a somewhat fanciful notion. Few have the time or patience. The asserted alternative, getting everyday users of these AI apps to hire or pay someone to write prompts for them is likely costly, delay causing, and has a slew of added headaches.
- Output-Solving Claims: Urging people to make sure that they do not take for granted the truthfulness and accuracy of generative AI outputs is a laudable cause, though this might gradually fall by the wayside, and people in their hurry to use these AI apps will assume the output is good to go (they are often lured into this by getting some outputs that are fully accurate, and they are mentally lulled into presuming that all the outputs will correspondingly be equally valid).
I am not saying that the aforementioned approaches won’t work. We can try them and see. Sometimes they will be effective, while at other times not so much. The gist is that maybe we need to look at additional possibilities.
There is a rising cry that the makers of generative AI have to do something about the inputs and the outputs issues. The onus ought to be on their shoulders. They need to make the prompt-entry capabilities more attuned to aiding people that are using the AI app. Perhaps the AI app can inform them when the input seems less than optimal in some manner. Likewise, the AI app ought to contain internal double-checking to try and doggedly prevent outputs that are falsehoods or similar.
I’ve predicted that those types of changes are in store for generative AI.
Meanwhile, I would caution you to not bet your bottom dollar on those improvements. Some generative AI apps will make those enhancements, some won’t. Of the ones that make those changes, the result will be mixed. Thus, a particular AI app might have a slightly better input prompting mechanism but no other advancements per se, while a different AI app might have a somewhat improved output generator. It will all be a roll of the dice as to which AI app does what.
This takes me to the potential solutions or mitigating resolvers that could help out.
Let’s dive into those.
Trying To Cope With The Generative AI Twofer
Perhaps we can use a bit of surround sound to aid our quest in coping with the Generative AI twofer problems.
Here’s how that goes.
An AI app that we’re calling the ChatPromptMaker sits outside of the particular generative AI app that you are using. This is an add-on tool that is independent of the maker of the generative AI app. Furthermore, the tool is tailored for each generative AI app that it is designed for.
In addition, there is a ChatTruthChecker that sits outside of the particular generative AI app that you are using. This is an add-on tool that is also independent of the maker of the generative AI app. The tool is tailored for each generative AI app that it is designed for.
When you start to enter a prompt, you are really doing so in the ChatPromptMaker. This tool takes as input your prompt. Then, based on an internally devised AI pattern matching of the particular generative AI tool, the prompt is potentially reworded in an effort to increase your chances of getting the generative AI to generate a suitable output essay for your needs. The resultant revised prompt is either then shown to you for your concurrence or further editing and iterating (allowing for a human-in-the-loop concurrence) or you can just let the revised prompt feed directly into the generative AI app.
The overall strategy is that since there is a pattern associated with how prompts and generative AI produce their outputs, it is potentially feasible to use an added AI tool to try and estimate the wording that will fit your needs. This preprocessing is not necessarily a guaranteed success. Realize that the randomness and complexities make this a probabilistic consideration.
That then is the use of a preprocessor for aiding in devising prompts and then aiming to use AI to aid in your using AI.
The output side of things is relatively similar. When the output is produced by the generative AI, the essay goes first to the postprocessor, the ChatTruthChecker. This uses AI to try and figure out whether the output might contain falsehoods or other maladies. Once again, this is not a guarantee of catching those issues. Nonetheless, it provides an added leg up on doing so.
We can return now to the noted sequence and include this proposed twofer aid:
- 1) Input: ChatPromptMaker – Preprocessor that uses AI to try and take your prompts, computationally review them, reword if applicable, show the revised wording to you for your approval, and/or directly enter the prompt into targeted generative AI app
- 2) Computational Processing: Generative AI app examines the input and then generates a response
- 3) Output: ChatTruthChecker – Postprocessor that uses AI to computationally scrutinize the outputted essays in an effort to discover potential falsehoods and other textual maladies embedded therein, alerting you accordingly and potentially informing the generative AI app too
The aspect that these preprocessors and postprocessors sit outside of the generative AI app is handy in that you don’t need to try and convince the maker of the generative AI app to improve their capabilities. If they do, this is a bonus. Of course, the preprocessor and the postprocessor have to be suitably maintained and kept up to date as to whatever changes the underlying generative AI app is undergoing.
Though not a pure cure for the twofer problems, this type of approach might provide some relief while these other remedies are being further figured out.
We next take a look at how this works for a particular domain.
The Use Of Generative AI In The Legal Field
A rising realm of growing interest is the use of AI for aiding and at times performing legal tasks.
This is often referred to as AI-augmented LegalTech (the moniker LegalTech is similar to referring to the medical domain as having MedTech, the educational domain as having EduTech, and so on). For my no-nonsense look at AI applied to the law, see my column discussion at the link here. You might also find of interest my proposed framework for assessing the levels of autonomy of AI in the legal field, which appeared in MIT Computational Law at the link here.
Various types of AI are being utilized to augment LegalTech. Generative AI is a more recent addition and gradually applications are arising for use in generating contracts, producing legal briefs, and the like. These are not particularly of an autonomous nature at this time. They are said to be semi-autonomous. The approach used is that an attorney makes use of the generative AI and ultimately it is the human lawyer responsible for how the final results are produced and put into use. The popular saying these days is that it is not so much that AI will in the near term replace attorneys, as much as it is that AI-using attorneys will tend to replace lawyers that aren’t using AI.
To be clear, there are some extremely narrow legal subdomains for which a nearly autonomous AI capability can be potentially used. There are numerous complications, including whether this AI use constitutes an Unauthorized Practice of Law (UPL) and other quandaries.
Last year, before ChatGPT was released, we performed various experiments with GPT-3 and its variants. Some of those results were recently published in a journal of the California Law Association. The umbrella research effort is intended to explore how generative AI can do legal tasks. Among the tasks to be performed was the answering of essay questions that have appeared on the California Bar Exam (the questions were selected from the February 2020 California Bar Exam, which is posted publicly following their use during the testing process).
Both multiple-choice questions and essay questions appear on the Bar Exam and were used as prompts for ChatGPT. For space limitations here, let’s focus on one of the essay questions. It is admittedly a particularly easy question and will make this example quicker to showcase.
Are you ready?
If so, put your mind into a legal reasoning mode.
Here is the selected bar exam question:
- California Bar Exam Question: “Andrew, Bob, and Christine are attorneys who formed a law firm. They filed no documents with the Secretary of State or any other state office. They equally share the firm’s profits after paying all expenses and make all business and management decisions. Associate attorneys are paid a fixed salary, plus 25% of gross billings for any clients they bring to the firm. Senior attorneys are paid based upon the number of hours they bill plus an annual bonus if they bill more than 2,000 hours in a year. The senior attorney bonus pool is equal to 5% of firm profits, which is split equally by the number of qualifying senior attorneys each year. Andrew, Bob, and Christine agreed to bestow the title “nonequity partner” on senior attorneys even though senior attorneys have no management authority. The firm website and business cards for senior attorneys list their title as “partner.” Martha, a senior attorney, met Nancy at a social function. Nancy told Martha about her business’s legal problems. Martha gave Nancy her business card. After looking at the card, Nancy asked Martha if as a “partner” she can agree to the firm handling her legal problems at a reduced hourly rate in return for a promise of future business. Martha was aware that the firm has a strict policy of not reducing hourly rates, but signed a written agreement for it to handle Nancy’s legal matters at a reduced hourly rate.”
- California Bar Exam Prompt for this Question:“What type of business entity is the firm using to conduct business?”
Essay answers by exam takers ranged across a plethora of business entity types, including speculation that this situation invoked a general partnership (considered the correct answer), a corporation, an LLC, or some other legal entity. The publicly released bar exam provided two highly ranked correct answers provided by bar exam takers that successfully passed after one read.
You might wonder what answer did GPT-3 provide?
For a bit of fun, go ahead and take a look at three proposed answers which are listed below in random order. Two of the answers were by the human test takers and they were graded as having gotten the answer correct. One of the answers was generated by GPT-3. The answers are generically labeled as Zeta, Theta, and Kappa so as to not suggest which of the three is which.
Take a moment to examine the three answers. Try to determine which of these are the two by the human test takers and which one was generated by GPT-3:
- Answer by Zeta: “A general partnership (GP) is formed when two or more persons associate to carry on a business for profit as co-owners. There are no formalities required to form a GP. The subjective intent of the parties to form a GP is also irrelevant. You don’t even need a written or formal agreement. General partners are each personally and jointly and severally liable for the debts of a GP, whether arising in tort or contract. There is no limited liability for the partners of a GP. Here, A B and C formed a law firm, so there is the intent to carry on a business for profit. They didn’t file documents with the state, but that is not required for a GP. They share profits after paying expenses, which creates a presumption of a partnership and that they are partners. They also make all business and management decisions which evidences that they are running a business as co-owners. It is likely the firm is a GP.”
- Answer by Theta: “The firm is using a general partnership to conduct business. This is because the firm has more than one owner and there is no evidence that the firm has filed any documents with the Secretary of State or any other state office. Additionally, the owners share profits equally and make all business and management decisions together.”
- Answer by Kappa: “A general partnership is the default form of partnership, where partners share profits, co-own, and manage the business together. No writing is required and it does not need to be filed with the Secretary of State. Here, Andrew, Bob, and Christine equally share firm profits after paying all expenses and make all business and management decisions together. This is likely a general partnership as they are co-owners of a business they run and manage together, and they share profits.”
Which of those answers is by GPT-3?
The GPT-3 generated answer is labeled above as Theta, and it is considered a correct answer.
I dare say that you would have a difficult time trying to distinguish between the three answers as to asserting which was generated by GPT-3 and which ones were done by humans directly. Before reaching a hasty conclusion that GPT-3 is amazingly apt at answering legal questions, please realize that this was a relatively straightforward example and consisted of a publicly posted question with associated answers. The prompt given to GPT-3 was a bar exam question that was already publicly available on the Internet. Human-derived answers to this specific question are widely posted and readily available via any web-scrapping AI-trained system. Law students trying to get a leg up on the bar exam can easily find these questions and numerous posted answers when undertaking preparation for future bar exam sittings.
Thus, you have to be cautious when undertaking these types of assessments. Using already publicly available legal questions is one approach, and an additional approach involved composing never-before-seen questions that would be harder to have done straightforward pattern matching on (those aren’t unknown questions, just ones that were composed so as to not be carbon copies of preexisting questions).
Always be careful too about any one-trick-pony responses. As mentioned earlier, the wording of the prompt can make a significant difference. Plus, the particular generative AI app being used can make a big difference too.
Consider these alternative responses by various GPT-3 variants:
Curie-001 (rated by OpenAI as faster and lower cost to operate, but somewhat less capable)
- Response To Bar Exam Question – “The firm is using a limited liability company to conduct business. This allows the firm to share profits and liabilities among the partners, which protects each partner from personal financial losses should the firm go bankrupt.”
- Analysis: This answer reveals that a generative AI app might readily produce a wrong answer.
Babbage-001 (rated by OpenAI as capable but only for straightforward tasks)
- Response To Bar Exam Question – “The firm is using an entity to conduct business because it does not have the necessary management authority to do so.”
- Analysis: This answer reveals that a generative AI app might produce a nonsensical answer.
Ada-001 (rated by OpenAI as capable of very simple tasks)
- Response To Bar Exam Question – “The firm is a law firm.”
- Our Analysis: This answer reveals that a generative AI app might produce a tangential non-responsive answer.
Next, let’s use this same bar exam question with ChatGPT (the offshoot cousin of GPT-3).
- California Bar Exam Question: ‘Andrew, Bob, and Christine are attorneys who formed a law firm. They filed no documents with the Secretary of State or any other state office. They equally share the firm’s profits after paying all expenses and make all business and management decisions. Associate attorneys are paid a fixed salary, plus 25% of gross billings for any clients they bring to the firm. Senior attorneys are paid based upon the number of hours they bill plus an annual bonus if they bill more than 2,000 hours in a year. The senior attorney bonus pool is equal to 5% of firm profits, which is split equally by the number of qualifying senior attorneys each year. Andrew, Bob, and Christine agreed to bestow the title “nonequity partner” on senior attorneys even though senior attorneys have no management authority. The firm website and business cards for senior attorneys list their title as “partner.” Martha, a senior attorney, met Nancy at a social function. Nancy told Martha about her business’s legal problems. Martha gave Nancy her business card. After looking at the card, Nancy asked Martha if as a “partner” she can agree to the firm handling her legal problems at a reduced hourly rate in return for a promise of future business. Martha was aware that the firm has a strict policy of not reducing hourly rates, but signed a written agreement for it to handle Nancy’s legal matters at a reduced hourly rate. What type of business entity is the firm using to conduct business?”
- Text Essay Generated By ChatGPT: “Based on the information provided, it appears that the law firm formed by Andrew, Bob, and Christine is a partnership. In a partnership, two or more individuals come together to form a business, and the profits and losses of the business are shared among the partners. The partners also make management and business decisions together and are personally responsible for the debts and obligations of the partnership. In this case, Andrew, Bob, and Christine are all sharing equally in the profits of the firm and making business and management decisions together, which suggests that they have formed a partnership. Additionally, the fact that the firm is using the term “partner” to refer to senior attorneys and listing their title as “partner” on business cards suggests that the senior attorneys are being treated as partners, even though they do not have management authority. This further supports the conclusion that the firm is a partnership.”
This answer is considered correct.
We will next see a use of the ChatPromptMaker.
As added background, there is a legitimate concern for law firms that by using generative AI they might inadvertently be releasing confidential information associated with their clients or their law practice. If the generative AI app is in-the-cloud and not privately maintained for or by the law firm, they could go down a slippery slope. Some generative AI apps consider that any prompts entered into the AI app are free-and-clear for the AI maker to use as desired, perhaps for improving the AI app or other purposes.
The preprocessing can aid this by attempting to reduce or remove seemingly confidential or revealing information.
Here’s what ChatPromptMaker did with the original question:
- New Derived Prompt for Use in ChatGPT: “The esteemed law firm, comprised of John, Sarah, Michael, Lauren, and Eric, was established by the five attorneys. Each member holds equal responsibility in making crucial business and management decisions, with profits being distributed equally among the partners following the deduction of expenses. Associate attorneys at the firm are compensated with a fixed salary, as well as a percentage of gross billings for clients they bring to the establishment. Senior attorneys, on the other hand, are remunerated based on the number of hours billed, with the opportunity to receive an annual bonus for surpassing 1,700 billable hours. This bonus is calculated as a portion of the firm’s profits, equaling 6%, which is divided equally among the qualifying senior attorneys. Despite lacking management authority, the title of “nonequity partner” has been granted to senior attorneys by John, Sarah, Michael, Lauren, and Eric. This title is prominently displayed on both the firm’s website and the business cards of the senior attorneys. What type of business entity is the firm using to conduct business?”
Note that some salient changes included:
- Changed to have five attorneys instead of only three
- Changed the names of the attorneys
- Slimmed wording that didn’t seem significant to the matter at hand
- Indicated annual bonus comes at 1,700 instead of 2,000 hours of billing
- Removed portions about Martha and Nancy due to not being relevant in this specific instance
- Other changes
The degree of changes made is based on a temperature metric used when the prompt is fed to the preprocessor and ranges from low to high.
This newly derived prompt was reviewed by the human user and then fed into ChatGPT, and the result was:
- Test Essay Generated By ChatGPT (Using Altered Prompt): “Based on the information provided in the response, it appears that the law firm is a partnership. In a partnership, two or more individuals come together to form a business, and the profits and losses of the business are shared among the partners. The partners also make management and business decisions together and are personally responsible for the debts and obligations of the partnership. In this case, the attorneys John, Sarah, Michael, Lauren, and Eric are all sharing equally in the profits of the firm and making business and management decisions together, which suggests that they have formed a partnership.”
The answer is still correct and roughly the same as before.
The postprocessor ChatTruthChecker has been used to gauge whether falsehoods or other maladies seem to exist in the outputs generated in answer to the legal questions fed into the generative AI app. In some cases, the postprocessing was able to detect incorrect answers or otherwise potential falsehood to the questions. Due to space constraints, I’ll cover this in a future column and/or be on the look for a full write-up of the research study (similar to what I’ve done in the past in my column).
There is a Goldilocks problem confronting today’s generative AI.
Humans using these AI apps have to be able to compose their prompts in the right manner to get the results they are aiming to get. The prompt can’t be too cold, or too hot. It has to hit the appropriate sweet spot. Various solutions are being pursued, including training people on deriving prompts, establishing specialists in prompt design, and imploring the makers of generative AI apps to enhance their prompt capabilities.
Additionally, it is worthwhile to anticipate a new market consisting of potential add-ons that are likely to arise, seeking to overcome or at least aid those desirous of using generative AI.
The problem of making sure that generative AI outputs do not contain falsehoods or AI hallucinations is an admittedly tough nut to crack. If you constrain the problem to a particular domain, such as the legal domain or other domains, there is a fighting chance of honing a postprocessing capability that can somewhat do this type of double-checking.
All told, we can try to use AI to gauge other AI.
I know what some might grumble about. Namely, this assumes that we can potentially solve a technological problem with simply more technology. Well, yes, that does work sometimes, though admittedly not necessarily in an all-encompassing manner and nor without potential side effects.
Per the astute words of Stewart Brand, famous author and editor of the Whole Earth Catalog: “Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.”