[Room Scripts] Store.setString/getString() encoding issues?

Has anyone used Store.setString/getString(), or the underlying UTF8 codec routines? I’m getting odd issues with a list of strings combined into a single string for storage that I have delimited with a form feed - String.fromCodePoint(0xA) - with certain entries coming back with what looks like some form of character encoding corruption at the end:

A bunch of justanuts fall to the ground, breaking apart on impa읁

I also repeatedly added Á Í Ï Ð Ý to my list of strings and ended up with the repeatable:

7: Á Í Ï ì
8: Á Í Ï Ðl
9: Á Í Ï Ð l
10: Á Í Ï Ð ì
11: Á Í Ï Ð Ý

and at one point :upside_down_face: U+1F643 turned into 🙼 U+1F67C, while another time it disappeared.

The result seems stable, i.e. corruption occurs at the end of the string as it’s stored, and happened the same way deleting and re-adding 7-11. I don’t see how it can be the delimiter as I use it as a prefix: noisesString += FORM_FEED + noise rather than at the end.

I know issues exist from wasm VMs and there are more general issues with database drivers round-tripping UTF-8 through Windows-1252 character encoding - I wonder if one of these might be happening here? Perhaps database encoding e.g. latin1 vs. utf8mb4?

Or do the buffers need to be null-terminated before being passed to the external store? There is an unused encode/decode parameter for this; reimplementing setString with this does not seem to help, though, nor does using UTF16 (no re-encode) rather than UTF8.

Also, while trying to debug this I found create roomscript exposes npm notices of updates to the user, but only if an error is reported:

...
FAILURE 2 compile error(s)
npm notice
npm notice New minor version of npm available! 10.8.2 -> 10.9.1
npm notice Changelog: https://github.com/npm/cli/releases/tag/v10.9.1
npm notice To update run: npm install -g npm@10.9.1
npm notice

Apparently this can be stopped with npm config set update-notifier false (or add update-notifier=false to ~/.npmrc

Hmm. Strange.
The Store.setString and Store.getString are just convenience methods to convert the string value to and from an ArrayBuffer, which is the data type actually being stored.

As you may see in the implementation, they just call String.UTF8.encode and String.UTF8.decode. Oh well. I’ll have a look at it and try to replicate the issue. Since I do want to store words like “Rödräv” as well!

Ooo! Haha. Nice catch.
Thanks, Green! I’ll patch it up. :slight_smile:

1 Like

Yeah, I ended up reimplementing them from the host source, and then trying with UTF16, didn’t seem to help, which is why I think it’s either the storage layer itself doing something odd or probably more likely the code I’m using to create a linefeed-delimited string array I can store in a single slot. Perhaps something odd about that interacts with the .split and .join methods?

I had hoped I wouldn’t be able to break it, though, because it’s a managed string and I’m not trying to modify individual bytes.

To be clear, though I did test with certain high-bit letters known to cause corruption in some circumstances, the case in which I found it involved regular English text.


To demonstrate it here is a test case and responses:

create roomscript test=export function onActivate(): void {
	Room.listen()
	Store.setString('test', 'Á Í Ï Ð Ý 🙃')	
}

// Required because AS doesn't support closures yet
function reduceList(acc: string, cur: string, index: i32, self: Array<string>) : string {
	return acc += '\n' + index.toString() + ': ' + cur
}

export function onRoomEvent(addr: string, ev: string): void {
	const FORM_FEED = String.fromCodePoint(0xA)
	let eventType = Event.getType(ev)
    if ('ooc' != eventType) return
	let ooc = JSON.parse<Event.OOC>(ev)
	let msg = ooc.msg.split(' ', 3)
	
	let test = Store.getString('test')
	if (test) Room.describe('Retrieved from store: ' + test)
	if (3 == msg.length && 'add' == msg[1]) {
		// Because split() stops after the last selected group
		let noise = ooc.msg.slice(ooc.msg.indexOf('add') + 4)
		Room.describe('To be concat: ' + noise)
		if (test) {
			test += FORM_FEED + noise
		} else { test = noise }
		Room.describe('To store: ' + test)
		Store.setString('test', test)
	}
	
	if (test) {
		let split = test.split(FORM_FEED)
		let list = ''
		let join = split.reduce<string>(reduceList, list)
		Room.describe('Split and reduced string: ' + join)				
	}
}

Created script “test” for room “test”.

set roomscript test : active = yes

Successfully updated room script “test”.

ooc test

⌈Retrieved from store: Á Í Ï Ð Ý :upside_down_face:

⌈Split and reduced string:
0: Á Í Ï Ð Ý :upside_down_face:

ooc noises add A small shamrock-leafed plant unfolds a set of pink star-shaped blooms.

⌈Retrieved from store: Á Í Ï Ð Ý :upside_down_face:

⌈To be concat: A small shamrock-leafed plant unfolds a set of pink star-shaped blooms.⌋

⌈To store: Á Í Ï Ð Ý :upside_down_face:
A small shamrock-leafed plant unfolds a set of pink star-shaped blooms.⌋

⌈Split and reduced string:
0: Á Í Ï Ð Ý :upside_down_face:
1: A small shamrock-leafed plant unfolds a set of pink star-shaped blooms.⌋

ooc test

⌈Retrieved from store: Á Í Ï Ð Ý :upside_down_face:
A small shamrock-leafed plant unfolds a set of pink star-shaped blooL⌋

⌈Split and reduced string:
0: Á Í Ï Ð Ý :upside_down_face:
1: A small shamrock-leafed plant unfolds a set of pink star-shaped blooL⌋


If I put const FORM_FEED = String.fromCodePoint(0xA) at global scope the output ended:
1: A small shamrock-leafed plant unfolds a set of pink star-shaped blool


Perhaps the copy to/from store does not account in some way for the eight-byte header, i.e. it allocates or copies the size of the ArrayBuffer (rtSize) when it should consider rtSize + 8 including the header?

After spending much of the day trying to figure out what really was going on here, I found it finally…

Nope. Nothing to do encoding/decoding.

It was THIS from badgerdb’s Txn.Set method:

Set adds a key-value pair to the database. It will return ErrReadOnlyTxn if update flag was set to false when creating the transaction.

The current transaction keeps a reference to the key and val byte slice arguments. Users must not modify key and val until the end of the transaction.

That last sentence.
I got the slice of bytes from the webassembly engine, and passed it on to store it in the BadgerDB database with the Set method. All done!
Well… until the webassembly engine reused that memory, and corrupted parts of it before the DB transaction was done.

Solution was to clone the byte slice into a some memory NOT handled by the webassembly engine. Phew. Those memory bugs are tricky.

Thank you so much for providing me with this great example to use!!

2 Likes

Yeah, that’d do it. Glad to hear I wasn’t going totally crazy! I didn’t even want to suggest data layer because I figured that would have broken something else, but scripting interface to it makes sense as that’s new. Looking forward to the update with the fix.