The problem this is meant to solve is that interactive queries could take an extremely long time while a large result set was being received by the czar. This is now working, so interactive only take slightly longer (a few seconds) while a large result is being received. It did not go as smoothly as I was hoping and some of the results were not what I expected.
The original xrootd change is meant to use as few network resources as possible while throttling large results being moved from workers to the czar. The spare network resources could then be used to send out queries. This was not enough by itself. Pausing all incoming large result blocks while the jobs for a new user query are being sent to workers, and significantly reducing the size of the first result sent back by the workers for each job in a task allowed the jobs for the new user query to go out quickly.
Since the code to send jobs out to the workers was single threaded and mixed with a significant amount of unrelated code, I pulled the code that sends the jobs out and ran it concurrently in a thread pool. I did expect a significant increase in speed. Instead, with a pool of 100 threads, it took about 25% longer to send the jobs out. Using 10 threads, it takes about the same amount of time as using a single thread. This is while running a single user query.
Also, using SetCBThreads(3000, 300); would cause the system to be unstable. SetCBThreads(1000, 100); appears to be fine. The default was something like 300, 0.
Aside from that, it appears to work well. Loads across the cluster remain reasonable and SELECT COUNT( * ) FROM Object; takes from 20 to 45 seconds.