Async Studies

I have been tinkering with async patches lately. There are a few useful regions in vl gamma to that regard, and also a bunch of nodes in the Observable category.

Now I know that observables have been included in vl mainly to deal with async IO and as a way to transcendent event handling, and not really with massive multithreading in mind, so I hope this thread is beneficial to more people.

To make sure we are on the same page, please see the attached file with all the experiments: async_studies.vl (115.4 KB)

The main goal of my studies I have not been able to accomplish yet: Self-balancing worker threads.

This pattern is quite common. A good example might be path-finding in e.g. Warcraft. Each time one of your night elf unit gets a new order, or detects that its path is blocked, it will ask the pathfinding service to give it a new path. This will be calculated as fast as possible, meanwhile the unit will stand still, but the game and all other units will continue their business. Eventually the service will be done and hand back the result to the night elf unit.

Since there are many units that might want to have a new path, all these requests will be queued in something like a job queue for the service. Also, because we have many many cores, we can split these jobs into multiple workers of said service.
Once one of the workers is finished with its job, it can tell the service that it is done and continue with the very next job.

So technically,

  1. a bunch of worker threads must be started at the beginning of the app, that have the ability to
  2. pop the next possible job from the queue (or other way of load balancing),
  3. run it, and
  4. send the result back to the mainloop, where it will be inserted into the game model.

Of course this pattern is much more versatile and useful than just for that use case, so in the patch studies it is much less involved, but already shows the necessary boiler plates for simpler async patterns, trying to get to the goal step by step.

I am far from sure that I did everything correctly, especially when using Pads inside regions, so I am more than happy for hints and discussions. And as stated before, I still haven’t accomplished to find a combination of nodes and regions to get as far as is necessary for worker threads.

Can someone help me get further?

7 Likes

Haven’t had a look at your patch yet and don’t know if it helps but seems related:

Good initiative.

Besides looking at the tooltip of the nodes/regions how best to do the timings?

One suggestion would be that as set of tests we put them into individual processes as tests. I think the CPU might spick something nasty is they’re all running in the main patch without a little control.

There are a couple other approaches that could be looked at.

Pooling
async_studies-ThemSchedules-Scheduler Thread Pooling

And the parallel foreach loop (experimental).

async_studies(2)-TheParallelForEach-Paralell ForEach

3 Likes

This sounds like it’s from old c++ ages and you almost never have to do that yourself in .NET. the TaskPool does this by default for you. so if you work with Task instead of Thread, you get load balancing for free.

yes, it is a common misconception that an observable automatically runs in background / another thread. by default it does not, observables are synchronous events. the fact that they often come from another (device) thread and that the operators are thread-safe blurs the lines.

so if you want to prevent that the execution of an observable blocks the main-loop you have to use the ToBackground node:

image

if you look inside the ToBackground node, you can see how to configure the threading context of the observable.

However in this case, the easiest seems to be to use tasks:
image

this creates 1.6M tasks and balances the workload over all available CPU cores. and as you can see, there is nothing stopping you from putting a task into another task (same with observables) …

async_studies2.vl (121.1 KB)

6 Likes

This seems to be a good read about this topic: Parallel Programming in .NET

Back on the workbench, and I have to say, ThreadPool and TaskPool is quite the revelation. Also, I started updating the experiments with ToBackground to actually be true to the comments.
Could this be a useful addition, to stay symetric?

image

What’s really getting to me right now is the lack of generic Tasks in VL, will report when I get a better grip on it. It seems the way to go, if we want Cancelability, proper Progress reports and Error management.

As for the link you provided, how much of that is “there” in vl? Am I correct in assuming that PLINQ is what’s under the hood of the Reactive category? I dove deeper into it, and discovered Sampler and Select, which is what seems most useful to my goal. Even though it is not really doing work async yet, it looks like it could.

image

PS: while patching, quite often it stopped reacting to F8. to resume proper operation, trusty old killVVVV.bat had to be used.

1 Like

Here is my current laboratory, in case you want to play too.
async_studies3.vl (161.2 KB)

Right now it has this issue, even when nothing is being calculated:

image

The crazy performance hit can be remedied by deleting the last test case (the one with Tasks). RAM usage is just super high, don’t know why really. Even deleting everything does not eradicate that. Maybe someone knows more?

what about this?

i have the feeling there’s still a small memory leak, but cpu idles after calculation is done.

async_studies4.vl (69.9 KB)

looks better on the ressource side of things, but it defeats the goal - i want jobs to “trickle in” as they come in, and results to “trickle out” as they finish computing (see first post). I definitely like your intuition of substituting the ImmutableDictionary though

It made me think, maybe we should look more at what you find if you google for “c# pipeline”, which does in fact need a different breed of dictionary (or any other collection) that is made for concurrency

but most important: thanks for tinkering along. Parallelism in .net seems one of those things where we are confronted with 2 decades of competing libraries, opposing best practises, and enough options to go mad - but nothing that works out of the box.
Hopefully together we find something that is minimal and fits the dataflow paradigm.

well, maybe you have to make a proper challenge-like thing out of this. i.e. create a task (that’s heavy to compute) and a job-generator that emits these tasks. and then let’s see what kind of patterns emerge and how they compare.

i found the given patches too confusing ( too many different approaches in there) so i tried to optimize the latest iteration.

i really like those kind of discussions and comparisons since there’s a lot to learn. ideally there’ll be some helppatches at the end that make it to the new F1 feature.

3 Likes

Fair enough, I made a better test bench.
Everybody feel invited!

Parallelism.vl (161.3 KB)

image

Rules of Engagement:

  1. Results should not deviate from the simple one-thread case. Missing to process jobs will result in disqualification.
  2. Only patch in the Computation stage. You may replace async-specific boilerplate in Job Creation and Consolidation too.
  3. Each job should be processed as fast as possible → least Average Ticks per job wins
  4. Mainloop locking should not occur → Mainloop Frame Count should be maximized (this will decide if there is a tie)
  5. Bonus points and 🏆 for using StatefulCalculation instead of the default StatelessCalculation

Since measurements will deviate depending on the pc it is running from, I will repeat them on a Ryzen 3700X. The actual testing strategy and particularities of job workloads, however, will be kept secret till then.

Edit One more thing: If you choose to investigate into Task Cancellation or Error Handling instead of competing, you can nevertheless qualify for the title of Async Master. Just make sure to publish your results

1 Like

if you want to get really fancy and distribute work over multiple machines, try this: https://getakka.net/index.html

Now akka might have been a little too far out at this point. This study was originally meant to be an exploration of what vl brings to the parallel kitchen, and how it needs to be boiled before it can be served. Who cares about machines, if you have a couple x86 cores to utilize right now!

Is anyone interested in the competition at all? I usually prefer my learning without adrenaline, but this actually could be fun, once we get grounded by covid19 one by one, maybe.

stonks, this thread goes viral in a few days

4 Likes

Just putting this here as a possibly interesting resource on the topic: https://www.wintellectnow.com/Home/SeriesDetail?seriesId=using-threads-effectively-to-build-scalable-responsive-and-fast-dotnet-applications-and-components

2 Likes

@joreg - unfortunately these videos are to pay, only the first few minutes are free.
however, this video here seems to be a sumup of all the topic presented above:

2 Likes

seems not only related, but vital for this thread:

image

@velcrome you need to put this into the link, so that the messages pass through it. not as an individual observer with no output, as shown in the patch screenshot.

The Debug node only tabs into existing observable routings, it doesn’t create it’s own.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.