Storing Binary Data as Strings

zep • 2020-07-05*2020-07-05 21:26* •

BBS>

PICO-8>Code Snippets

(recommended: use with PICO-8 0.2.1b or later)

This function can be used to convert binary strings into a format that can be pasted into source code. Binary strings contain all characters from chr(0)..chr(255) and as such include unprintable / unstorable characters. escape_binary_str() adds the needed escape codes and stores the remaining characters as-is. For example character 10 becomes \n and character 0 becomes \0, or \000 when followed by a number (to avoid ambiguity).

This is useful for storing dense binary data efficently (e.g. compressed with PX9). If you are storing structured data in code (like a raw image), it will likely be easier and almost as efficient to store them as a bunch of hexadecimal characters.

function escape_binary_str(s)
 local out=""
 for i=1,#s do
  local c  = sub(s,i,i)
  local nc = ord(s,i+1)
  local pr = (nc and nc>=48 and nc<=57) and "00" or ""
  local v=c
  if(c=="\"") v="\\\""
  if(c=="\\") v="\\\\"
  if(ord(c)==0) v="\\"..pr.."0"
  if(ord(c)==10) v="\\n"
  if(ord(c)==13) v="\\r"
  out..= v
 end
 return out
end

Workflow

Step 1. Generate a Binary String

binstr=""
for i=1,256 do
 binstr..=chr(i%256) -- any data you like
end

?#binstr         -- 256
?ord(binstr,256) --   0
?ord(binstr, 13) --  13

Step 2. Escape the String and Copy to Clipboard

printh(escape_binary_str(binstr), "@clip")

Step 3. Paste into Source Code

* Turn on Puny Mode (CTRL-P) // to make sure uppercase characters are encoded as punyfont

CTRL-V into source code as a string value bindat="[paste here]". You should get something like this:

bindat="¹²³⁴⁵⁶⁷⁸	\nᵇᶜ\rᵉᶠ▮■□⁙⁘‖◀▶「」¥•、。゛゜ !\"#$%&'()*+,-./0123456789:;<=>?@𝘢𝘣𝘤𝘥𝘦𝘧𝘨𝘩𝘪𝘫𝘬𝘭𝘮𝘯𝘰𝘱𝘲𝘳𝘴𝘵𝘶𝘷𝘸𝘹𝘺𝘻[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~○█▒🐱⬇️░✽●♥☉웃⌂⬅️😐♪🅾️◆…➡️★⧗⬆️ˇ∧❎▤▥あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわをんっゃゅょアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンッャュョ◜◝\0"

4. Enjoy your Binary Data

The contents of bindat can now be accessed with ord(bindat, index) (note that index is 1-based).

data encoding

merwok • 2021-08-17 2021-08-17 21:58

Hi zep! (First comment here but not the first time this is discussed on discord :)

Could you confirm that in addition to 'ord', we can also use 'sub' to get more that one byte at once, and use 'poke' to get the data into memory?

freds72 • 2021-08-18*2021-08-18 08:23*

sub returns a string - @merwork: are you expecting that to work:

poke4(0x4300,sub(byte_str,1,4))

merwok • 2021-08-18*2021-08-18 13:41*

No, I would expect sub to return a substring of the initial byte string!

Edit: tested 'sub', works as intended (doesn’t cut any character), so it can be used to split a big encoded string into sub-strings. This makes it possible for example to encode many levels in one string then get one to load it, rather that being forced to define separate strings. We can’t use 'poke' though, so to load binary data we have to call 'ord' in a loop.

merwok • 2021-09-06 2021-09-06 18:16

0.2.3 changelog:

> Added: ord(str, pos, num) returns num results starting from character at pos (similar to peek)

One call, no loop \o/

Felice • 2022-12-30*2022-12-30 22:05*

@zep

I noticed someone referring to this post and also that it hasn't been updated in a while, so I figured I ought to take a stab at refining the function. This comes in at about half the tokens (now 57) and probably performs better, though this sort of function probably doesn't get used much at runtime, so maybe that's not so important.

-- ordinal (0..255) -> escape sequence table
ord_esc=split("¹²³⁴⁵⁶⁷⁸\t?ᵇᶜ?ᵉᶠ▮■□⁙⁘‖◀▶「」¥•、。゛゜ !?#$%&'()*+,-./0123456789:;<=>?@abcdefghijklmnopqrstuvwxyz[?]^_`abcdefghijklmnopqrstuvwxyz{|}~○█▒🐱⬇️░✽●♥☉웃⌂⬅️😐♪🅾️◆…➡️★⧗⬆️ˇ∧❎▤▥あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわをんっゃゅょアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンッャュョ◜◝",1,false)
ord_esc[0]="\\0"    -- nul
ord_esc[10]="\\n"   -- newline
ord_esc[13]="\\r"   -- cr
ord_esc[34]="\\\""  -- quote
ord_esc[92]="\\\\"  -- backslash

function str_esc(s)
	local r=""
	for i=1,#s do
		r..=ord_esc[ord(s,i)]
	end
	return r
end

BTW I had to convert a literal tab in my split("...") string into "\t" because the BBS code parser converts tabs to spaces. This seems like a possible problem. I really wish you'd preserve tabs in code previews and just set the CSS "tab-size" value to something appropriate for PICO-8. I suggest 2, as always, but do 4 or 1 or whatever, just as long as you keep the tab character as-is. Code blocks should never be molested in any way other than styling them, really.

Edit: Here's the code inside a cart just so it can run the unit test, which just compares my method's results with yours, and eventually break when something changes. 😜

Binary string escaper

by Felice

Cart #foziyotehe-0 | 2022-12-30 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA

JadeLombax • 2022-12-31 2022-12-31 18:45

@Felice

That's nice and streamlined, but it doesn't add 2 extra zeroes to \0 glyphs if they're followed by numeric symbols, which could cause errors.

dw817 • 2022-12-31 2022-12-31 19:23

Yep, @JadeLombax and @Felice. For instance in my compressor I must use \48 to \57 for digits. If I don't the data messes up.

freds72 • 2023-01-01*2023-01-01 09:59*

-- removed incorrect comment --

Felice • 2023-01-01 2023-01-01 13:41

@JadeLombax

Oh, drat, I missed that element of zep's converter. I'll see if I can come up with something that's streamlined but still works well, hmm. Lemme think.

teddblue • 2024-01-06 2024-01-06 01:35

I've been working with a system similar to this, but \0 isnt read, it will also corrupt the next byte if its a 0. are there any solutions for this?

pancelor • 2024-01-06 2024-01-06 06:28

yes @teddblue that's fixable: "\000" is another way to encode the 0 byte, "\005" for 5, etc. you should use this way to avoid issues if the next byte is an ascii number ("0"-"9", bytes 48-57)

escape_binary_str (above) handles this with the local pr = line -- take a look at that part of the code

teddblue • 2024-01-11 2024-01-11 20:35

yea i just have my encoder use "\000" instead of "\0" now and it works great.

merwok • 2024-05-31 2024-05-31 15:53

pancelor shared that some editors might convert a pasted \t to spaces, so the snippet can be edited to add a line to handle this!

Steven()#6 • 2024-07-05*2024-07-05 12:24*

How would one handle this tab-to-spaces issue? "\\t"?

merwok • 2024-07-05 2024-07-05 14:44

add this:

  if(ord(c)==9) v="\\t"

Steven()#6 • 2024-09-27 2024-09-27 09:58

How would one obtain of a non-escaped binary string in the first place? Like, the binary string of a GFX element, for instance. Copying GFX gives you a hex value.

shiftalow • 2024-09-27 2024-09-27 13:44

@teven()#6

Does this suit your purpose?

str="convert a string to hex"
foreach({ord(str,1,#str)},function(v)
 ?sub(tostr(v,1),5,6)
end)

Steven()#6 • 2024-10-01 2024-10-01 19:06

This seems to convert a string to hex? I'm looking for a way to convert data to a binary string; for instance, GFX (spritesheet) data. I'm poor at bitwise math, so I can't tell if I'm getting it right or wrong. But I'm trying to learn how to covert data to a binary string, and interpret data from that string.

shiftalow • 2024-10-03*2024-10-03 07:23*

@ Steven()#6
Ah, I misinterpreted it!

dlen=0x2000 --full
--dlen=0x1000 --half
str=""
foreach({peek(0,dlen)},function(v)
 v=chr(v)
 str..=v
 ?v
 --stop()
end)

This code allows you to check the unescaped binary characters in the sprite sheet one by one.
The str variable also stores the result of concatenating those characters.

(Unescaped binary strings pick up control codes, so they cannot be displayed as a single piece of text or copied and pasted properly.)

[Please log in to post a comment]

About | Contact | Updates | Terms of Use | Picotron

Follow Lexaloffle:

Generated 2025-06-11 20:12:04 | 0.018s | Q:47

User:
Password: