At least 8.5 is there, and its contribution articulated. So you one can add that to the way they receive the paper text body as a proposition.
I have not looked at the paper, and share the same concerns about chatGPT being about formatting language more than producing content, I would blame the lack of characterization of its database, which would be as necessary as any course code of the data massager, and dialogue or prose generator. Specially on topic which are new things of research with scientific or reproducible intent. It can't generate precise sources out of thin air. If it looks smart in certain areas, it might be because of the community of humans that ended up discussing rigorously and sufficiently arriving at some common understandings and clear writing of it would have spread and be frequent in its learning database. Ask it to create new reproducible methods with new possible meaning and interpretations, and you would only get hypotheses that best fit its programming limits in term of probability space over what ever ambient space of linguistic elements they have been developped (ok long sentence, I have my own chatGPT in my mind, it likes to guess too).
My concerns are about what i read here in the op. Not against augmenting the dimensionality of a measure meant to adress the foggy concept of strenght beyond a savant one numerical averaging of win rate under some population distribution assumption. Or even without that last part (glicko).
I made a long post. which I am putting elsewhere (kind of like blogs on lichess might do to other sites). I am developping there. and might get back here.. some other times when some dust of mine settles.. (i mean pruning my verbage for punchline extraction).
I go dimension or measure of 3. one by one.
lichess.org/forum/team-dboings-musings/3-component-or-dimension-measure-thread-overflowBelieve it or not, i intended below to be the short version. but then i have this quota of verbage to pour, in order to feel like i made something self-contained, to some extent. maybe a bad plan of writing in general..
1) is same problem as ELO (pop. dist. assumption, propagating departure from it in reality to all individuals in unknown ways, some might have relation to what seems to not be satisfying with ELO).
2) is already done by glicko. but without 1).
3) is research.. and possibly some machine source of hypotheses that might be taking into account all the existing "noise" about what goes on in certain co-horts of the pools of concerns (both players and game pools).
Conclusion, from op, transparent sharing, that** has to be mentioned, and supported. The third only is new, and is a development on the K factor intent. There are many other avenues of modelling of population sub-species (watchamagonnacallthat), or corhorts, with some know characteristics or mechanism of "rating" change not due to "uncertainty" that is not due to those factors (as in 2nd component).
But scientific data (non rating source of information, about chess behavior other than whole game win rate) about individual players and understanding of such non-"random" evolution are needed. I would say.. at least to have not just one thing floating up in the air to put all our belief into for generations to come again, and the fear of not having such simple truth. Anyone really think that chess strength can be reduce to 1 or 3 numbers?
also. how many numbers to represent the 3 measures? 1? I gave 1 up and 1 down in the op post. 1 up for orginal transparency about the reasoning skeleton. 1 down for presenting it as a new things, or a mature thing. I put a thinking mark, for the 3rd component, as a research direction. not yet a rating measure to be adopted for generations that really need a trustworth rating system to measure indidivual players true strenght, not about game pairing band similarity or game difficulty fairness during tournament (which has its own way of finding the winner gladiator, does not need rating).
** I think we need some hygiene about machine learning tools blends with our own communications, while waiting for legitimate organizations (representative of the many, not the few) to impose some rigor on publications (including web-site) human content, and machine programming assumptions, including the databases maximal disclosure (in how they can help interpret the machine output).