The pilgrim office detailed published statistics go back to 2003. 10 years after the introduction of the 100km minimum rule. Numbers beginning in SJPDP and Roncesvalles were still significantly more than those starting from Sarria in 2003. A huge change took place over the past two decades - last year the numbers who started at Sarria outnumbered walkers from SJPDP and Roncesvalles by 4:1.
Older Pilgrim Office statistics should be viewed with caution.
There are at least two obvious issues with their historic data. Back in the days when the Pilgrim Office offered automated access to relatively raw data with individual identities scrubbed out it was easier to spot these issues. In the current environment where only summary data is available these issues are obscured.
1) Prior to the recent computerised system each individual pilgrim wrote their starting point on a piece of paper and were free to write what ever they liked. People chose all sorts of stuff and then often spelled it in various ways. This, combined with the second issue meant that sometimes legitimate starting point numbers were incorrectly assigned to a "other" hold-all category.
2) There are at least four "eras" to the data where data was collected using either different strategies, different people, different priorities, different purposes or all four.
The oldest data represents one era or series of data. Then there is a change that is obvious to see, possibly as a result of increasing numbers of pilgrims and perhaps a realisation that the statistics which might have initially been a single persons good idea needed to be assigned to a particular person/responsibility.
At this change point the second era starts and new starting points were added and some of the old ones stopped being used. This system of starting points is complicated because there seems to have been a policy of only reporting on significant starting points and grouping all other starting points into the other category.
For the major starting points this was less of an issue but with many of the intermediary starting points if the numbers fell below a particular point then that point dropped out. So, for example some months you see Leon reported as a starting point with numbers and then in other months Leon disappears. This doesn't necessarily mean that no one started from Leon that month but rather that Leon's numbers for that month are included in the "other" category rather than on its own.
During this second era there is a break in the data and for some months no data is available at all. At around this time there are also numbers that look suspicious to me.
Immediately after this discontinuity to the data the third era starts and so perhaps the previous system was overwhelmed and a new system and/or new people were involved. The data in this third era seems more reliable but it still suffers from the "significant starting point issue" where some months data is reported from a starting point and on other months that starting point is not mentioned at all.
I would expect that a data professional would settle on a list of starting points and always report numbers for those points including zero for months when no one starts from that point. Instead we have this strategy of starting points appearing and disappearing without explanation.
The fourth era represents the current system where I suspect that the numbers are calculated automatically from the computerised registration system. Undoubtedly this era will have its own quirks.