escape unicode characters in filepath
-
Hi there,
I'm currently having issues converting unicode characters on Windows...
I'm utilizing the
AcceptDragObject
in ac4d.gui.TreeViewFunctions
.I want to act on a given folder that is dragged into my treeview.
if dragtype == c4d.DRAGTYPE_FILENAME_OTHER
gives me the direct filepath.For example I'm having a folder called
test_with_äüö
on my Desktop.My main issue is, that I need to work with the variable coming in from the
dragtype
calleddragobject
and can't convert the direct result!Here's a simple test-script to play around with:
import c4d import os def _direct_conversion(): return r"C:\Users\lasse\Desktop\test_with_äüö".decode('utf-8') def _from_variable(): dragobject = "C:\Users\lasse\Desktop\test_with_äüö" s = r"%s" % (dragobject) s = s.decode('utf-8') return s def main(): s = _direct_conversion() s = _from_variable() print "os.path.isdir:", os.path.isdir(s) if __name__=='__main__': main()
The bad thing is that the
_direct_conversion()
gives me the correct result while_from_variable()
does not.I'm probably totally overthinking this, but any ideas how to solve that "simple" problem are welcome!!!
Cheers,
Lasse -
I'd say you need to find out what format the dragobject is in, even, when you get it from the function, and whether os.path.isdir is accepting that formatting.
In your example, first you are using a literal that contains a \t symbol which inserts a tab into your string.
Then ins = r"%s" % (dragobject)
you are using a raw string as format source, but that doesn't influence dragobject at all: r only affects the string before formatting, the actual evaluation of % happens in the next step and leaves dragobject unaffected (including the tab).In the utf-8 decoding, you get the original string back in unicode: C4D stores its script files in utf-8 already, so any literal containing unicode characters will be used as-is.
On my system, the only issue
isdir
has is with the tab character:import c4d, os from c4d import gui def check(s): print s, len(s), type(s) s = r"%s" % (s) # no effect print s, len(s), type(s) s = s.decode('utf-8') print s, len(s), type(s) print "os.path.isdir:", os.path.isdir(s) def main(): dragobject = "K:\klm" print "Path: klm:" check(dragobject) dragobject = "K:\klmäöü" print "Path: klmäöü:" check(dragobject) dragobject = "K:\tklmäöü" print "Path: tklmäöü:" check(dragobject) dragobject = r"K:\tklmäöü" print "Path: raw tklmäöü:" check(dragobject) # Execute main() if __name__=='__main__': main()
results in
Path: klm: K:\klm 6 <type 'str'> K:\klm 6 <type 'str'> K:\klm 6 <type 'unicode'> os.path.isdir: True Path: klmäöü: K:\klmäöü 12 <type 'str'> K:\klmäöü 12 <type 'str'> K:\klmäöü 9 <type 'unicode'> os.path.isdir: True Path: tklmäöü: K: klmäöü 12 <type 'str'> K: klmäöü 12 <type 'str'> K: klmäöü 9 <type 'unicode'> os.path.isdir: False Path: raw tklmäöü: K:\tklmäöü 13 <type 'str'> K:\tklmäöü 13 <type 'str'> K:\tklmäöü 10 <type 'unicode'> os.path.isdir: True
(all directories used here actually exist)
However, I am aware that this is not the root problem, as you do not construct
dragobject
through a literal when getting it from a function. I would suggest you check type and bytewise encoding of the value you receive, and adapt the decoding accordingly. -
Yeah, the problem lies in the escaping characters...
Sadly the function only returns simple backward slashes
C:\Users\lasse\Desktop\test_with_äüö
so\t
will become a tab character.There wouldn't be a problem if the returned path would be with two backward slashes
\\
or even forward slashes/
... That might be worth a bug report!?Thankfully someone on stackoverflow had the same issue and came up with a function to convert this "bad" path...
backslash_map = { '\a': r'\a', '\b': r'\b', '\f': r'\f', '\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v' } def reconstruct_broken_string(s): for key, value in backslash_map.items(): s = s.replace(key, value) return s
So in my example I can do:
dragobject = "C:\Users\lasse\Desktop\test_with_äüö" s = reconstruct_broken_string(dragobject).decode('utf-8') print os.path.isdir(s) # returns True
That is overly complicated and somewhat convoluted, but I haven't found any other way.
Cheers,
Lasse -
@lasselauch said in escape unicode characters in filepath:
Yeah, the problem lies in the escaping characters...
Sadly the function only returns simple backward slashes
C:\Users\lasse\Desktop\test_with_äüö
so\t
will become a tab character.There wouldn't be a problem if the returned path would be with two backward slashes
\\
or even forward slashes/
... That might be worth a bug report!?I'm actually not sure what the problem is, then. If the function returns the string "as is" with backslashes and no conversion, then this is practically the raw string. Converting it to Unicode should not interpret the backslashes - it's factually the same as in your original function
_direct_conversion()
which works fine withisdir()
?The problem in your sample code only happens because you use a literal to create the string. But that is not what you do with the treeview, you are receiving the string from AcceptDragObject, and you say above that what you get is the raw string.
So, I'm a bit at a loss what's actually not working here. I suppose you may need some conversion to display the string, but you have not mentioned that yet.
-
The path from
AcceptDragObject
'sdragobject
is NOT working withos.path.isdir()
. That is on Windows when using unicode characters e.g.äöü
in your filepath... That's my main problem here... -
Okay, it seems I just need to use
dragobject = dragobject.decode('utf-8')
then.
Sorry for all the confusion and thanks for the help @Cairyn !Cheers,
Lasse -
Hi, @lasselauch as a rule of thumb with Python2.7 always store data as a Unicode string.
Control your IO which means, for each, Input make sure you know the encoding in all cases by always calling
.decode('utf-8')
so you are sure to store a Unicode value and get an error if something went wrong at a loading time.
Then output the content according to your need some need ASCII, some can work with unicode, but do the conversion on the fly, don't touch you stored Unicode data.For the francophone people, there is this fabulous article about unicode Encoding in Python for English I guess the best I found on this topic is A Guide to Unicode, UTF-8, and Strings in Python.
Cheers,
Maxime.