Recording Data to disk efficiently

Hadasi · April 14, 2021, 6:35pm

Hi all,

Let’s say I have Kinect and I want to record 4 skeleton performers for a couple of hours, non stop, with no dropped frames. Lets say my drive is a half full 500GB SSD drive and lets say I want to be able to scrub through the data at a later stage. Its just an example, and while I want time stamps they could be written to a separate file or what ever fits best

What would you suggest is the best way to record these vectors, floats and Ints to disk without choking the RAM, nor the SSD, or the CPU if possible.

I imagine there may be some chunking approach, or writing lots of tiny files to disk.

All suggestions welcome
H

bjoern · April 14, 2021, 7:41pm

Just off the top of my head:
Encode them into uncompressed dds textures.
24 columns (one for each joint type) and 44 rows (4 skeletons * 11 properties).
Each property encoded in a pixel / color (RGBA).
1056 px * 4 byte ~ 4,125kb/frame ~ 7,45 MB/min (@30 FPS).
Should be no problem to write that amount even to a slow HD.
About 572 h of recording time for your half full 500 GB SSD.
You could use the timestamps as filenames.

For playback you could use the textureplayer.

I am sure there are more sophisticated approaches.

antokhio · April 14, 2021, 8:50pm

If you write just joint positions you can do binary it will be fast and small but no image data

schlonzo · April 15, 2021, 9:24am

I believe depth precision is 16bit. if you want to keep it you will have to split the Z values into two pixels if you write to an R8G8B8A8 texture. Or waste space if you write to R16G16B16A16. Maybe the texture way is a bit overkill, since it is not really a lot of data per frame.

Hadasi · April 15, 2021, 9:30am

Thanks for these answers folks,
but to clarify, this is just for the skeleton data, not the point cloud/image, and while it is a relatively low amount of data per frame maybe it would be some other arbitrary data structure, like aeroplane instruments or car sensors for a black box recorder, or several hundred of IoT sensors in a hurricane. Imagine you were working with a Sports car team and needed to log all the car’s behaviour to the nearest thousandth of a second to analyse or recreate the performance of the driver and car.

ravazquez · April 15, 2021, 10:27am

Not sure if this is a good fit, but for a big project where we needed to sync timeline data (text and floats mostly so think “datastructure”) at 30fps for hundreds of variables over 20+ stations in real time we ended up using a timecoded BSON (binary JSON), so one JSON object per frame contianing all the vars. This loaded amazingly fast from disk, reduced the file size of our JSON file dramatically, and allowed us to have it all in memory making scrubbing and lerping very easy.

bjoern · April 15, 2021, 10:41am

You could also have a look at Protcol Buffers or MessagePack. The readme of the MessagePack-CSharp repo has a performance comparison.

Hadasi · April 15, 2021, 1:13pm

@bjoern MessagePack looks like a great candidate, but it needs Attributes for the data structures created, so I’m not sure how well or how easily it an tie in with VL at the moment without knowing the class ahead of development.

I’m not sure how fast there performance will be but there are a couple of steps (including compiling the protobuf.h) that need a bit of know how and other tools in my brief experience with them.

Protobufs are super useful for things like gRPC too so sending data over a network as a class.
Inbuilt protobuf composition would be amazing to have in Gamma (just saying)

@ravazquez sounds like a really good approach. Did each machine capture its own data and share it? And - going offtopic slightly - if not, did all the machines have read them from same Network drive or did they have a local copy each and received a syncing frame via TCP/OSC to know which file to look up?

ravazquez · April 20, 2021, 10:34am

@Hadasi I had to go back and have a closer look to remind myself what was going on, here is a more in depth explanation:

We had a 10+ minute timeline generated in C4D which controlled all of our application parameters (100’s). C4D serialized all parameter values as JSON, 30 timecoded entries per second so the files got large fast. Our v4 system (server) then ingested this exported JSON on load (here is where BSON made a massive difference in size and load times), and ran through it based on a generated clock in real time. It then propagated each parameter via OSC or NetMQ to the interested clients or applications. This guaranteed a “shared timeline” across all devices so you could scrub, seek etc. from the server and all clients would react accordingly.

Hope that clarifies things a bit more.

system · April 20, 2022, 10:34am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.