# We can use the method directly on the bytes For example, if we try to use UTF-8 to decode a UTF-16-encoded version of nonlat above: Since we can encode strings to make bytes, we can also decode bytes to make strings-but when decoding a bytes object, we must know the correct codec to use to get the correct result. Though both calls perform the same function, they do it in slightly different ways depending on the encoding or codec. If we changed it to UTF-16, we'd have a different result: Indeed we got the same result, but we did not have to give the encoding in this case because the encode method in Python 3.x uses the UTF-8 encoding by default. Does this mean if we use an encode method call on nonlat, that we'll get the same result? Let's see: but what exactly does that mean? It means that the single character contained in our nonlat variable was effectively translated into a string of code that means "字" in UTF-8-in other words, it was encoded. Now we have our bytes object, encoded in UTF-8. Let's use a common one, the UTF-8 encoding: TypeError: string argument without an encodingĪs we can see, we need to include an encoding with the string. If we want to turn our nonlat string from before into a bytes object, we can use the bytes constructor method however, if we only use the string as the sole argument we'll get this error: Converting Python Strings to Bytes, and Bytes to Strings Now to see how bytes objects relate to strings, let's first look at how to turn a string into a bytes object and vice versa. Synta圎rror: bytes can only contain ASCII literal characters. Which is why the following won't work (or with any non-ASCII characters): How or why they are arrays of integers is not of great importance to us at this point, but what is important is that we will only see them as a string of ASCII literal characters and they can only contain ASCII literal characters. The thing about bytes objects is that they actually are arrays of integers, though we see them as ASCII characters. In Python 3.x, however, this prefix indicates the string is a bytes object which differs from the normal string (which as we know is by default a Unicode string), and even the 'b' prefix is preserved: In Python 2.x, prefixing a string literal with a "b" (or "B") is legal syntax, but it does nothing special: However, if we don't need to use the unicode, encode, or decode methods or include multiple backslash escapes into our string variables to use them immediately, then what need do we have to encode or decode our Python 3.x strings? Before answering that question, we'll first look at b'.' (bytes) objects in Python 3.x in contrast to the same in Python 2.x. If you have dealt with encoding and Decoding Strings in Python 2.x, you know that they can be a lot more troublesome to deal with, and that Python 3.x makes it much less painful. What would happen if we have a character not only a non-ASCII character but a non-Latin character? Let's try it:Īs we can see, it doesn't matter whether it's a string containing all Latin characters or otherwise, because strings in Python 3.x will all behave this way (and unlike in Python 2.x you can type any character into the IDLE window!). The visible difference is that s wasn't changed after we instantiated it.Īlthough our string value contains a non-ASCII character, it isn't very far off from the ASCII character set, aka the Basic Latin set (in fact it's part of the supplemental set to Basic Latin). In contrast to the same string s in Python 2.x, in this case s is already a Unicode string, and all strings in Python 3.x are automatically Unicode. Now if we reference and print the string, it gives us essentially the same result: We'll start with an example string containing a non-ASCII character (i.e., “ü” or “umlaut-u”): Let's examine what this means by going straight to some examples. Thankfully, turning 8-bit strings into unicode strings and vice-versa, and all the methods in between the two is forgotten in Python 3.x. Encoding and decoding strings in Python 2.x was somewhat of a chore, as you might have read in another article. The changes it underwent are most evident in how strings are handled in encoding/decoding in Python 3.x as opposed to Python 2.x. The Python string is not one of those things, and in fact it is probably what changed most drastically. Many things in Python 2.x did not change very drastically when the language branched off into the most current Python 3.x versions. Encoding/Decoding Strings in Python 3.x vs Python 2.x Here we will look at encoding and decoding strings in Python 3.x, and how it is different. In our other article, Encoding and Decoding Strings (in Python 2.x), we looked at how Python 2.x works with string encoding. Last Updated: Wednesday 29 th December 2021
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |