Does TAG Maintain Consistency in Card Grading?

admin note: this split off from this thread PSA Completely Random Grades in 2025?

I encourage you if you do grade a lot to grade the same card with TAG and see if you get the same grade every time to test your claim. I have big doubts that’d be the case and, unless you’ve done it yourself already, it’s kinda wild to claim that as fact.

You expect computers to not be consistent? What?

Unless the algos change or you added some scratches/smudges in the crack>ship process, it would be the same every time.
I’m not sure you understand their grading process if your making this remark.

And its also a pointless counter argument to make since TAG takes a digital thumbprint of every card. This card would be flagged instantly in their system as a resubmission

Do you understand how convolutional neural networks work? What about other machine learning algorithms?

If you aren’t in the field, it can be easy to say that a computer is 100% accurate every time. Unfortunately, that’s just not the case with image classification algorithms.

Let’s say that you want an algorithm to classify a balloon. Your binary outcomes are Yes, it’s a balloon or No, it’s not a balloon. Now you train an algorithm on thousands of different balloons, including different shapes, sizes, colors, finishes, some with writing and some without writing, etc. This algorithm would identify balloons pretty well with a binary classification when compared to a cat, a box, a tooth, etc.


But we aren’t talking about binary classification with TAG.

TAG produces 36 1,000 point estimates for each card + consideration of surface wear on the front and back, dimensions of the card, etc. Can you appreciate how incredibly difficult it would be for an algorithm to have perfect or near perfect reliability and validity with so many 1,000 point estimates?

There are nearly an infinite number of ways that a card can be damaged, and this damage will not look identical across every card size, TCG, finish, holo, etc. Even if TAG had a well-trained algorithm off of hundreds of millions or a billion+ observations, it would be foolish to think that reliability and validity of each scan will be perfect or near-perfect to a 1,000 point mark.

Like human grading, there will be some level of variance around each estimate.

17 Likes

1000 point estimate is not a big deal for a neural network no… but this depends on some other factors that is beyond the scope of e4

I think ppl are too caught up with the 1000 point system
Could it deviate one time from 963 to 962 points overall? Maybe, i dont have inside info of their machine learning system but regardless the overall score would be a 10 in both scenarios.
Ive only seen a tiny minority of people shell out $75 to actually see a 1000 point grade. The point is if your grade a Tag 9 and then regrade it immediately after it WILL get another 9
Its not going to suddenly get a 7

Also, 36 subscores will more likely make for consistency than inconsistency
Because even if one registers off slightly its weight is not enough given the vast amount lf other analysis
In fact the more complexity and more computations made, the more accurate and consistent the grades… not the opposite as your implying

2 Likes

I think it’s reasonable to say that the overall grade may not jump from a PSA 6 to a PSA 10 like it may at PSA due to the identification of small dents. But that is likely more so related to their grading scale and what constitutes each grade than their algorithm’s success at classification per se.

That’s not at all how this works.

Adding complexity to data introduces opportunity for error. The greater number of computations, the greater opportunity for false positives and false negatives. If you want to read more about this, look into sensitivity, specificity, and ROC curves for binary classification.

Like I mentioned above, image classification algorithms are highly sensitive to their training and validation data. If the algorithm is not trained on numerous examples of every type of damage, it may incorrectly classify new (unseen) damage in the future as evident (false positive) or not evident (false negative). This leads to higher variability in each estimate and may impact the overall grade.

As you say, this may not be an issue at all because people use the 1-10 scale most often. Okay, then why offer it at all?

The biggest issue with ordinal classification is boundary cases. A card graded as 949 may look identical to a card graded as 950, but one may be a TAG Mint 9 and one may be a TAG Gem Mint 10. As you can see, a 1 point difference can lead to a change in grade. That’s why added complexity is bad and that’s why TAG will never be perfectly reliable and valid.

If you criticize grading companies for messing up a 10 point scale, how can you truly believe that TAG’s 1,000 point scale is more reliable and valid?


I think this is the last reply that I’ll make. You are making a good-faith attempt to respond, but, respectfully, your knowledge in this area is insufficient to have a complete discussion on the topic. This isn’t an attack on you, just the reality of talking about very niche, nerdy, topics on online forums.

14 Likes

A post was merged into an existing topic: PSA Completely Random Grades in 2025?

No offense but you still havent explained at all why there would be any variation for tags system

949 is just an arbitrary number for them
What matters is that unless the instruments were obstructed and the the ai couldnt “see” the same thing, there is no reason why it wouldnt just 949 again and again
Every image it takes is associated with a specific and deadly consistent way to understand and interpret what its seeing and translating it into a score

This literally will not change.

It works within a rule set. If the instruments see the same exact thing, the same rules will trigger

You saying “a 949 looks like a 950” yes thats how a human would look at things, not a computer

The models would have to change. The data of how to treat a specific flaw would have to be updated
Thats literally the only way it could give variation

Could tag be wrong in its grade? Yes
Absolutely yes
Thats why i shop tag misgraded slabs to crack and sub to different grading companies

For instance tags system was suggesting that there was a surface defect on the holo pattern of a japanese base set card, but in reality it was just a sporadic holo pattern that appeared stacked cosmos

But i get it, not many japanese base cards with crazy cosmos patterns have been graded through tag so their isnt enough training being done with this type of card for it to understand what to do with what it sees as an anomaly

So the card was dinged for a surface flaw and scored a tag 5
It was dinged hard for the surface flaw but a human grader wouldnt see an issue with it

So can it be inaccurate?
Yes
But what matters is that its consistently inaccurate
Unless the models changed, tags system would pick it up as a surface flaw over and over and ding it the same way

But so long as the instruments show the same images each time, the calculations and inferences made will be exactly the same.

And you haven’t demonstrated one time, even remotely, how or why that would not be the case.
So respectfully, i will just assume its because you dont have a real argument at all

That’s rough, buddy. Or something.

5 Likes

Dyl explained it very well to you and even gave you the benefit of the doubt and excused your ignorance on this matter because it’s difficult for many people to understand. Then you doubled down.

I’ll just say that I’ve got a lot of experience training detection and classification models. Given the variance in card designs and potential damage and defects, the notion that TAG’s models can be consistently accurate or consistently inaccurate is laughable.

Their play for a market of consumers that want to grade cards at kiosks should signal to anyone that understands this technology that they’re not to be taken seriously.

The only way this stuff works is in a controlled environment, with human operators that can catch it when it fails. Any other way is a recipe for chaos.

14 Likes

No
He didnt
And neither did you

He said “ if the algorithm is not trained on numerous examples of every type of damage, it may incorrectly classify new (unseen) damage in the future as evident (false positive) or not evident (false negative)”

Explain to me this new unseen damage when the instruments provide identical images of a resubmitted card
There is no variation for there to be a different analysis
Unless the instruments are damaged and providing different data to interpret. The grade would be the same

Also, why is this thread now about Tag

Should i re name the thread tag grading speculation 2025?
I want a mod to move the tag posts to the running tag threads and leave that discussion there
This is a psa graded card that is being regraded with cgc it doesnt involve tag

You are misquoting me because you don’t understand what training and validation datasets are, or machine learning for that matter.

In my quote above, new, unseen, damage is not referring to a card that received new damage, but observations in a dataset that the algorithm has not trained on.

Let’s say POKEBEAST algorithm was trained on balloons from my earlier example. Now if we feed it a hot air balloon, it might have difficulty with classification because it’s kind of a balloon and kind of not a balloon.

Stretching this to trading cards now.

If you are training an algorithm on specific types of damage, when a card comes along that doesn’t fit neatly into those buckets of what damage is supposed to look like, it might make mistakes with classification and grading.

8 Likes

To train a robust model you cant just give it a few examples of a specific type of damage. You’re going to need not just hundreds of examples of each type of damage on each card variation/design style but rather thousands upon thousands of meticulously labeled examples of a each specific type of damage, on every specific card design, holo pattern etc.

Additionally, the camera/hardware being used plays a large role. You need consistency across inputs, meaning if they were to have a camera that is slightly different in its configuration from the training data, this WILL result in skewed results.

There have been numerous generations of Pokemon cards, Magic cards, sports cards etc created over the years and this means you need an absolutely massive dataset to train a robust model - and that dataset needs to be meticulously cleaned and labeled before training.

To think that TAG has hit it out of the park and they can have consistency of accuracy, or consistency of inaccuracy, and that the process would not depend on human beings verifying the outputs is just a false assumption.

And to be fair, as Dyl was to you, its an easy false assumption to make if you do not know what you’re talking about. You’re not to blame for this assumption - a LOT of people cannot understand the caveats of training a good classifier. Its a niche subject that not many people work in and the way AI is sold by some of these corporations, and TAG itself, makes people think its some kind of magic bullet when really it can be highly flawed based on numerous factors from before and after training.

I believe in AI grading and expect it to be adopted by other companies, but it should only be used as a step in the process to aid graders. Not the main method of determining the cards grade.

And again, the kiosk idea is absolutely absurd to me.

8 Likes

Note knowing how Tag process image it is hard to provide a constructive answer but often AI systems are designed to integrate randomness and uncertainty into their decision-making processes, meaning that given the same input, the AI might produce different outputs.

Just looking at the acquisition part of the system you would likely get a lot of variance. System often needs to be calibrated (they diverge over time), they wear over time (just look at some of the scans from PSA) and depends of external factors like temperature. Maintaining a system with very tight tolerance is expensive and I suspect that alone has a strong impact on variability

9 Likes

I feel like everyone’s just working under different assumptions. If you assume that TAG has achieved a very controlled imaging station with consistent methods of imaging/lighting/scanning, I don’t see why TAG regrading should not generally be precise (accuracy and precision is an important distinction here,
and @POKEBEAST is arguing for the latter).

I feel like most of the arguments against POKEBEAST are talking about how the algorithm can’t deal with new types of damage or whatever it hasn’t been trained on, but that isn’t what POKEBEAST is really trying to point out. Again, if TAG has achieved a very controlled environment for imaging a card, the scans that are outputted should be very similar and therefore should highlight flaws in a similar manner, which, in turn, would likely be analyzed and “graded” in a consistent manner if the same algorithm is being applied. You’d basically be asking the same version of an algorithm to analyze basically the same exact picture. And I would hope TAG’s algorithm could pass at least a consistency check as basic as that.

Of course, if the scans bring up a type of damage or flaw that the algorithm hasn’t been trained well on, you’ll get inconsistency. But outside of novel cases, you should get consistency (again, making certain assumptions about their imaging capabilities and algorithm).

3 Likes

Its a good discussion at least

I would be interested in, and actually surprised that, there isnt someone who has posted this data

I get it would be a expensive to pay $75 plus shipping each time but im honestly suprised there is no youtube video titled:

I graded the same card with Tag ten times, the scores will shock you” with stupid streamer shocked face thumbnail

2 Likes

To think of all the problems the world is facing and then that people might be wasting resources on doing something as inconsequential and wasteful as this lol.

I mean, I suppose most of our society centers the inconsequential and wasteful but still it’s a pretty sobering reality.

This is the correct answer. The input has variability. A very subtle change in the input can lead to very different results, like this famous example below

Additionally, the algorithm will change and be updated over time so even the exact same digital input will have nondeterministic output.

And as @Dyl mentioned, the more variables you’re measuring, the more room for error.

But one thing I’ll point out is that with the quality of images that TAG uses, you could create a unique signature/profile of every card they’ve graded, which can be used to detect resubmissions. If you can identify resubmissions, you can just reassign the same grade (even if in theory the grade would change if you regraded the card from scratch)

The company PSA acquired was supposed to be able to do this: https://www.psacard.com/articles/articleview/10391/psa-acquires-genamint-introduce-next-generation-technology-grading-process. Not sure if/how they are using this technology.

7 Likes

I can give a more concrete example. If you had a poorly trained convolutional neural network that did the grading, simply doing a horizontal flip of the image could give a totally different result:

This is 100% of the same information just a mirror reflection. In fact, a common trick to “create” more training data for a CNN is to do things like taking an image and flipping, skewing, rotating it and treating it like multiple data points.

13 Likes

I worked as a machine learning engineer focusing mainly on anomaly detection for 4 years, and in defence of AI grading I think people are exaggerating the data requirements to create a ‘robust’ model. (I use quotes here because robust is a subjective term) You can use augmentation techniques to overcome some data deficiencies. The trouble is there will always be something new, something unique, and there will always be mistakes made due to a variety of factors.
If you put the same image in the same model, then yes it will always give the same output. But the problems are
a) If you were to resubmit then it would not be the exact same input so there will be variance there
b) I imagine the models they are using are continuously being updated and retrained leading to variance in output.

You’re also, of course, just assuming that the original grade is ‘correct’ when in reality it is based on some subjective scale… as is everything. (you could argue this point has nothing to do with consistency)

16 Likes

Preach brotha :folded_hands:

1 Like