Python os.path.expanduser on Windows with Japanese users bug
Two really good developers, Alecu and Diego, have discovered a very interestning bug in the os.path.expanduser function in Python. If you have a user in your Windows machine with a name hat uses Japanese characters like “雄鳥お人好し” you will have the following in your system:
- The Windows Shell will show the path correctly, that is: “C:Users雄鳥お人好し”
- cmd.exe will show: “C:Users??????”
- All the env variables will be wrong, which means they will be similar to the info shown in cmd.exe
The above is clearly a problem, specially when the implementation of os.path.expanduser on Winodws is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
def expanduser(path): """Expand ~ and ~user constructs. If user or $HOME is unknown, do nothing.""" if path[:1] != '~': return path i, n = 1, len(path) while i < n and path[i] not in '/\': i = i + 1 if 'HOME' in os.environ: userhome = os.environ['HOME'] elif 'USERPROFILE' in os.environ: userhome = os.environ['USERPROFILE'] elif not 'HOMEPATH' in os.environ: return path else: try: drive = os.environ['HOMEDRIVE'] except KeyError: drive = '' userhome = join(drive, os.environ['HOMEPATH']) if i != 1: #~user userhome = join(dirname(userhome), path[1:i]) return userhome + path[i:]
For the time being my proposed fix for Ubuntu One is to do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
import ctypes from ctypes import windll, wintypes class GUID(ctypes.Structure): _fields_ = [ ('Data1', wintypes.DWORD), ('Data2', wintypes.WORD), ('Data3', wintypes.WORD), ('Data4', wintypes.BYTE * 8) ] def __init__(self, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8): """Create a new GUID.""" self.Data1 = l self.Data2 = w1 self.Data3 = w2 self.Data4[:] = (b1, b2, b3, b4, b5, b6, b7, b8) def __repr__(self): b1, b2, b3, b4, b5, b6, b7, b8 = self.Data4 return 'GUID(%x-%x-%x-%x%x%x%x%x%x%x%x)' % ( self.Data1, self.Data2, self.Data3, b1, b2, b3, b4, b5, b6, b7, b8) # constants to be used according to the version on shell32 CSIDL_PROFILE = 40 FOLDERID_Profile = GUID(0x5E6C858F, 0x0E22, 0x4760, 0x9A, 0xFE, 0xEA, 0x33, 0x17, 0xB6, 0x71, 0x73) def expand_user(): # get the function that we can find from Vista up, not the one in XP get_folder_path = getattr(windll.shell32, 'SHGetKnownFolderPath', None) if get_folder_path is not None: # ok, we can use the new function which is recomended by the msdn ptr = ctypes.c_wchar_p() get_folder_path(ctypes.byref(FOLDERID_Profile), 0, 0, ctypes.byref(ptr)) return ptr.value else: # use the deprecated one found in XP and on for compatibility reasons get_folder_path = getattr(windll.shell32, 'SHGetSpecialFolderPathW', None) buf = ctypes.create_unicode_buffer(300) get_folder_path(None, buf, CSIDL_PROFILE, False) return buf.value
The above code ensure that we only use SHGetFolderPathW when SHGetKnownFolderPathW is not available in the system. The reasoning for that is that SHGetFolderPathW is deprecated and new applications are encourage to use SHGetKnownFolderPathW.
A much better solution is to patch ntpath.py so that is something like what I propose for Ubuntu One. Does anyone know if this is fixed in Python 3? Shall I propose a fix?
PS: For ref I got the GUI value from here.