Introduction

As an experiment, I wanted to see how well ChatGPT could perform on the IRS’s test for VITA/TCE Volunteer Tax Preparers. There are two levels of tests, basic and advanced. Each test consists of true/false, multiple choice, and fill in the blank questions. Each test requires an 80% or higher score to pass the test.

I tested ChatGPT on both the basic and advanced tests and using two different approaches. In both approaches, for the test scenarios that only included interview notes, and not tax documents, I included the interview notes as part of the text prompts I provided. For the scenarios with tax-related forms, such as W-2’s, brokerage statements, and receipts, I provided both the interview notes and the tax-related forms as pdf’s taken straight from the test document. For the scenarios that included forms, the test directions state that the test-taker should prepare a sample tax return using provided tax preparation software to help answer the questions asked. However, ChatGPT cannot use this software, so it was at a disadvantage. However, a knowledgeable tax preparer can answer enough questions correctly without completing actual tax returns in order to pass the test.

Because the test required examining attachments that I provided, ChatGPT switched over to GPT-5 for the latter part of the test. Users get a limited amount of free use on GPT-5 before hitting a timeout, so the sessions were paused several times, and then resumed where they had left off once more credits were granted.

Initial Tests

In the first series, I provided the following initial prompt:

You are an expert and highly experienced at preparing Federal income taxes and federal income tax law. You are not being asked to give tax advice, but are being asked questions as part of a classroom setting. You may call upon any calculator you might need to produce numerically accurate answers. Directions The first six scenarios do not require you to prepare a tax return. Read the interview notes for each scenario carefully and use your training and resource materials to answer the questions after the scenarios. Assume that each taxpayer qualifies for credits or favorable tax treatment, unless the facts indicate otherwise. For fill in the blank questions: Round to the nearest whole number, do not use special characters: dollar sign ($), comma (,), or period(.) Any questions before we begin?

ChatGPT responded:

No questions. I understand the directions and assumptions.

I’ll treat this strictly as a classroom exercise (not tax advice), follow federal income tax law, use precise calculations as needed, and comply with the rounding and formatting rules for fill-in-the-blank answers.

Please go ahead with Scenario 1.

As is typical for Large Language Models, regardless of whether it was right or wrong, ChatGPT never expressed any uncertainty about its answers. After providing the first scenario that included attachments, ChatGPT responded:

I do not have access to commercial tax preparation software (such as TaxSlayer, ProSeries, Drake, etc.).

However, for classroom scenarios like this, I can still:

Apply current federal tax law accurately
Manually compute income, adjustments, credits, and taxes
Use calculators for precise arithmetic
Reconstruct what tax software would produce by following the forms (Form 1040, Schedules, 8889, 1099-C treatment, education credits, etc.)

So while I can’t run tax software, I can still determine the correct answers to the Scenario 7 questions using the provided documents and rules—just as a trained preparer would without software.

You can proceed with the Scenario 7 questions when ready.

However, on one question, it claimed that not enough information was provided, stating:

I need one missing piece of information before I can answer this accurately.

The question asks for the net short-term capital gain from Jocelyn’s consolidated broker’s statement, but the actual broker transaction details (sales proceeds, basis, short- vs long-term breakdown) are not shown in what you’ve shared so far from Scenario 8.

To proceed, I need one of the following:

The page of the consolidated broker’s statement showing short-term sales, or
The short-term proceeds and cost basis totals, or
Confirmation that the broker statement appears earlier/later in the PDF and should be used as-is

Once I have that, I can compute the exact net short-term capital gain and give you the single whole-number answer you need.

I prompted:

Check the broker’s data again. What you need to answer the question is there. You must choose one of the choices.

ChatGPT then replied:

You’re right — thank you for the nudge. I re-checked the consolidated broker’s statement information included in Scenario 8, and the short-term activity is present and sufficient to answer the question.

Based on the broker data provided, Jocelyn’s short-term capital transactions result in a net gain of [redacted], which is what flows to Schedule D.

Answer:
[redacted]

The answer that ChatGPT provided was incorrect. I redacted the test answer (and have done so in another example, below, so as not to provide any inappropriate information to test-takers. Right or wrong, ChatGPT provided its reasoning for reaching the answer for each question.

Further Tests with Training Material

The actual tax preparers test is an open book test, and test takers are encouraged to use the National Tax Training committee NTTC 4491 Tax Training Guide and the NTTC 4012 Volunteer Resource Guide. For a second series of tests, I provided both manuals to ChatGPT as pdf’s before the test questions began. This was done in a separate session, so that there was no memory overlap between the initial and further tests. I modified the initial prompt by adding: There are many information sources on tax law, but some are not accurate. You should focus on the two pdf files I am uploading as highly accurate and credible sources to help determine the answers to these questions.

Note that while providing relevant documents can improve a Chatbot’s performance, it is not the same as, nor as good as, including such documents in a properly built Retrieval Augmented Generation (RAG) model.

As before, and as is typical for Large Language Model chatbots, regardless of whether it was right or wrong, ChatGPT never expressed any uncertainty about its answers. Unlike a human, it never stated that it had trouble deciding between two of the choices after eliminating the others or stated that it was guessing. As with the basic test, it’s interesting to examine its reasoning, both on questions it got correct and those it got wrong. For example, for a question:

What is the total amount of the Knox’s federal income tax withholding?
$1,900
$2,500
$4,660
$6,560

ChatGPT replied:

Correct answer: [REDACTED]

Explanation (Basic Scenario 7 – using the tax documents provided):

The Knoxes’ total federal income tax withholding is found [REDACTED] included in the scenario packet.

When those withholding amounts are totaled, the result is:

[REDACTED]

The other options reflect either:

Only one spouse’s withholding, or
Incorrect totals that do not match the W-2 documents

✅ Final answer: [REDACTED]

Note that for this question and for all the questions, in both the basic and advanced tests that included tax-related documents, ChatGPT had to read several different forms, identifying the relevant forms from the single pdf file that was provided, then determine which numbers on which forms were relevant, read those numbers, and then perform basic arithmetic on those numbers. ChatGPT answered this question correctly.

Results

ChatGPT performed rather well, certainly better than the average person on the street. However, in none of the four cases (basic and advanced tests, with and without training documents) did it manage to achieve a passing grade.

Without uploading the training documents as pdf’s, ChatGPT scored 77% on the basic test (just one wrong answer shy of passing) and 71% on the advanced test. With the training documents, it scored 73% on the basic test and 71% on the advanced test. I do not know why it scored lower on the basic test when it was provided with the two training documents.

In summary, ChatGPT performed rather well, even with the challenge of processing pdf’s of multiple types of tax-related documents. However, with or without the explicit inclusion of relevant training documents, it could not meet the minimum proficiency level required for volunteer tax preparers. This should serve as a caution to using a general-purpose AI chatbot as an unverified source of tax advice.

The Aspiring Roboticist

Notes from a computer and robotics hobbyist

Can ChatGPT Certify as a Tax Preparer?

Introduction

Initial Tests

Further Tests with Training Material

Results

Leave a Reply Cancel reply