Handling Windows Paths with a Umlaut in Python - Ä.
-
Hi, i encountered a problem using paths containining Umlauts in my code. Im trying to rename Customer Texture files which have Umlauts in there names.
I traverse the folder, glue the filenames together which does not produce any error.
originalFilePathAsciiiUTF8Mix = os.path.join(srcFolder, 'tex', originalFileName)
However when i try to use the concated path, with file.open or shutil.copy2 the path is not found and the displayed path in the error message seems to be a strange mixture of properly escaped ascii symbols and not escaped UTF8 elements.
I printed out the path as hexValues and the values look like a classic ascii (no codepage) string (not clear on what codepage) and raw UTF8 Values in the middle.
Example:
originalFilePathAsciiLatin1 -> |C=43|:=3a|\=5c|T=54|e=65|s=73|t=74|\=5c|I=49|n=6e|p=70|u=75|t=74|\=5c|4=34|3=33|4=34|6=36|9=39|\=5c|4=34|3=33|4=34|6=36|9=39|\=5c|t=74|e=65|x=78|\=5c|k=6b|i=69|e=65|f=66|e=65|r=72|_=5f|g=67|e=65|f=66|l=6c|=c3|=a4|m=6d|m=6d|t=74|_=5f|1=31|.=2e|j=6a|p=70|e=65|g=67
So ive read into encodings and utf8 in Python 2.7 and boy, what a rabbit hole.
My question is, how can i convert this "hybrid" into a valid utf8 representation and finally into a path that the services used by shutil and os.copy can handle.originalFilePath = originalFilePathAsciiiUTF8Mix.decode('utf-8').encode('cp1250') os.copy(originalFilePath, sanitizedFilePathName)
-
Hi, @FSS first point is that this is a bit off topic, as this question have nothing related to Cinema 4D, moreover even if this would apply, this is R21, which is unsupported as we only support the 2 last version which are both using Python 3 and have solved all these issues.
With that's said, I don't really understand why do you try to encode back to cp1250, all these os / shutil are designed to support utf-8 characters and will work correctly. Find bellow a code that correctly work.
# coding: utf-8 import c4d import os import shutil # Main function def main(): #__file__ == C:\Users\m_adam\Desktop\Äsomething\untiÄtled.py srcFolder, fileName = os.path.split(__file__) fileName = fileName.replace("Ä", "A") newFile = os.path.join(srcFolder, 'tex', fileName) shutil.copy(__file__.decode("utf-8"), newFile.decode("utf-8")) # Execute main() if __name__=='__main__': main()
Cheers,
Maxime. -
Eh, this was the first approach i tried, it will map the decoded utf-8 to Windows codepage 1252. (Sorry about the wrong one in the code above). I tried to change the system setting to utf8 but it would not accpet that. Thanks for your help, Maxime