Tokenizer Behaviour?

motzi · January 15, 2013, 4:03pm

hi there,

i’m having problems understanding the Tokenizer (String) completely. even though i understand how to use it there are some things that i find weird.

here’s my situation:
i’m receiving raw data from the RS232 (Devices) node. the data is always of fixed length, but does not have a delimiter/separate character.

now i’m using Tokenizer (String) to avoid incomplete datawords from my device (the way it is supposed to be used, i guess). since there is no delimiter character i leave the seperator-pin empty, use mode:PrefixFixedLength, and QueueMode: Enqueue - similar as in this demo-graphic:

now after quite some time of studying i found out this:

to get the same amount of characters out of the Tokenizer you have to add 1 to the Token Length. e.g.: input word has 8 characters -> token length has to be 9 to recieve 8 characters. i assume that this has something to do with the Separator character (so usually, when a separator is used this extra +1 is for the separator). in my case (where there is no separator and the pin is empty) this does not make any sense to me.
it appears to me that before the Tokenizer outputs something it first has to fill up its internal buffer. this also means that the output is delayed for one frame even though all the nescessary data is here already. can you confirm this? wouldn’t it be more logical to immediately output the part of the string that fulfills the input critera? (i’m not 100% sure whether the node behaves as i described).
another thing is the queue itself: it would come in very handy to have a reset-pin on the Tokenizer, similar to what an ALT+rightclick does. the issues i’m having is that sometimes my buffer gets “out of sync” (meaning: since i don’t have separator and my device is sending some extra data i would like to be able to clear the queue since all following characters will be appended until the queue is full. then the queue is output and the following characters will be appended to the next incoming word. there is no possibility to clear the buffer except for the little workaround shown above, where the Token Length is reset to the Queue Length+1 for one frame. this outputs the overlapping characters and empties the buffer for the next incoming data word. what are your thoughts on this?

i can imagine that the logic for this node is not that simple. but still, these behaviours surprised me in a way and took me some time to figure out.

thank you.

tokenizer_behaviour.v4p (15.2 kB)