I am running into a curious situation, @zep, regarding Pico-8 when it comes to lowercase letters.
For instance, in new code type this:
To type the lowercase letters you will need to press CTRL+P first, then type the letters, then press CTRL+P afterwards to return to normal uppercase letter mode.
Now highlight the code. The easiest way to do this is to triple-click it. Press CTRL+C to copy it.
Now paste it in any text editor like Notepad or even a message in here with CTRL+V. You will get the following:
Why is the text italicized ?
If you run the code and then use
printh(a,"@clip") and then CTRL+V in the same text editor, you get this:
Which is think is more correct. What do you guys think is going on here and are you getting the same results I do ?
This was actually kind of a cool rabbit hole to have gone down, because I don't believe this is italicized plain text but something else entirely.
I was able to repeat your findings in Kate on my Linux laptop, and I tried a few different variations of your string (hitting Ctrl-P before the space in one iteration and after the space in another, and also before and after the end quote in separate iterations). In all cases, I got the same result as you.
So then I pasted the text into a Google Doc, and the interesting thing is that the pasted text doesn't register as italicized in Google Docs. In fact, if you highlight the pasted text and hit Ctrl-I, it italicizes more rather than unitalicize.
I figured the puny font must be a completely different character set from the standard ASCII characters, so I looked up the "italic" lowercase a character in an online character code finder (http://www.mauvecloud.net/charsets/CharCodeFinder.html if you're interested) and it gave me back the hex code 0xd835 for that character. That gives you 55349 in decimal, which led me to https://www.utf8icons.com/character/55349/utf-8-character. Turns out this is possibly a reserved/future-use UTF-8 character (or maybe something valid/existing in a different character set), but it seems likely that this is intentional to keep the puny font characters as separate things from standard upper/lowercase characters. Obviously this is just wild speculation on my part, I have no way of knowing what the inner workings of PICO-8 are in this regard :). But it seems to me like intentional behavior and not a bug.
Also, since you're on Windows and I think I remember you being a number keypad user, you can supposedly type these characters into Notepad. Hold the Alt key and while holding, press the + sign on the number pad then type 55349, then release the Alt key. I saw mention somewhere that this might require some registry setting to be enabled, but I am not a Windows user so I can't say for sure.
So, here's what I found.
I am getting the same behavior, but as @packbat points out behavior is different whether or not puny mode is on. If I take the example from @dw817's screenshot, copy it from pico-8, and past here, I get:
If I turn on puny mode in pico-8 (ctrl+p), copy again, and paste, I get:
Further, if I copy 𝘢𝘣𝘤𝘥𝘦𝘧𝘨 from a text editor and paste into pico-8, I get lower case regardless of whether or not puny mode is on.
I believe @2bitchuck is correct that the italic characters are a seperate character set, but is only getting half the decimal number.
http://www.mauvecloud.net/charsets/CharCodeFinder.html is giving 2 numbers for each italics character, probably since this character set is in UTF-16 rather than UTF-8. If I have it output in hex rather than decimal and compare to https://unicode-table.com/en/1D44E/ , I see these numbers match what is given for UTF-16BE encoding.
The short answer is: they are two alternative ways to encode punycode outside of PICO-8, and are both needed in different contexts.
The long answer:
Early versions of PICO-8 editor were strictly single case, and pasting ASCII uppercase characters into PICO-8 would resolve them to the same ordinal character values as pasting lowercase. This was (and I think still is) nice because you can type code listings in uppercase to make them visually separated, but still copy-paste-able. Also, uppercase ASCII is visually the closest match to what you see in the code editor.
But of course, some users want to edit .p8 files directly, and I didn't ever go so far as to remove 'A'..'Z' as legal identifier characters in the Lua parser, or re-map uppercase characters in the .p8 loader. So now we have a bunch of code in the wild that uses MixedCase, and loads and runs just fine. But how should this be presented inside PICO-8?
ok, fine -- it's not a separate case. It's a separate /font/ ("punyfont"). When copying punyfont characters, they are encoded as "Mathematical Sans-Serif Italic Small A..Z" to be visually distinct from both upper and lower case ascii, to be unambiguously punyfont characters, and to be roughly the closest visual match to punyfont (the same way that shift-C cat is encoded as 🐱, etc).
.p8 files could be encoded this way too (and in fact, that punyfont encoding is accepted by the loader), but it would be extremely awkward to use with external editors. So as a compromise, uppercase characters are used to encode punycode by the .p8 saver. Same for printh -- more often than not ascii characters are desired and so that is the default.
@zep, I'm just thinking about this. Would going to true 256 ASCII be a good way to solve this then ?
That is I could copy this symbol via:
printh(chr(254),"@clip") and instead of getting this outside of Pico-8,
◜ you would get this: ■ which is the true ASCII for 254.
To test this, in Notepad, hold down the ALT key, don't let it go, from the number keypad type 254, then release the ALT key. You will get that square.
This would go both ways too so if I copied ■ to the clipboard from say Notepad then it would return back to Pico-8 with CTRL+V as
◜ to maintain compatibility.
Oh, hi @aced.
No, it would only convert FROM. That is you could take P8SCII code and copy it to the clipboard where it looks crazy-like when pasted back in Notepad.
But if you copied it from Notepad back again, it would be the same P8SCII code to the character.
If this cannot be done, well I withdraw the suggestion. I was just trying to find a simple solution.
Would it be difficult to add an additional
printh(s,"@clip")-style target, let's call it
"@clip8" (get it, "p8"?), that does the italicized version PICO-8 uses internally for puny case instead of what you do to convert to unicode? Sometimes I want to paste directly back into PICO-8 and the unicode output does NOT translate back correctly.
For instance, if you run this:
printh("Hello Capitalized World", "@clip")
And then immediately paste into the editor, all casing is lost.
An external editor gets "hELLO cAPITALIZED wORLD", but the caps are lost pasting into PICO-8.
Ideally we could printh to
"@clip8" and get "h𝘦𝘭𝘭𝘰 c𝘢𝘱𝘪𝘵𝘢𝘭𝘪𝘻𝘦𝘥 w𝘰𝘳𝘭𝘥", which would paste correctly back into PICO-8 as the original "Hello Capitalized World".
You could also maybe have a target like
"@clipi" which works like the normal
"@clip" but inverts case so we get externally what we see internally in PICO-8.
[Please log in to post a comment]