Python os.path.expanduser on Windows with Japanese users bug

by mandel on October 18th, 2011

Two really good developers, Alecu and Diego, have discovered a very interestning bug in the os.path.expanduser function in Python. If you have a user in your Windows machine with a name hat uses Japanese characters like “雄鳥お人好し” you will have the following in your system:

  • The Windows Shell will show the path correctly, that is: “C:Users雄鳥お人好し”
  • cmd.exe will show: “C:Users??????”
  • All the env variables will be wrong, which means they will be similar to the info shown in cmd.exe

The above is clearly a problem, specially when the implementation of os.path.expanduser on Winodws is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def expanduser(path):
    """Expand ~ and ~user constructs.
 
    If user or $HOME is unknown, do nothing."""
    if path[:1] != '~':
        return path
    i, n = 1, len(path)
    while i < n and path[i] not in '/\':
        i = i + 1
 
    if 'HOME' in os.environ:
        userhome = os.environ['HOME']
    elif 'USERPROFILE' in os.environ:
        userhome = os.environ['USERPROFILE']
    elif not 'HOMEPATH' in os.environ:
        return path
    else:
        try:
            drive = os.environ['HOMEDRIVE']
        except KeyError:
            drive = ''
        userhome = join(drive, os.environ['HOMEPATH'])
 
    if i != 1: #~user
        userhome = join(dirname(userhome), path[1:i])
 
    return userhome + path[i:]

For the time being my proposed fix for Ubuntu One is to do the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import ctypes
from ctypes import windll, wintypes
 
class GUID(ctypes.Structure):
    _fields_ = [
         ('Data1', wintypes.DWORD),
         ('Data2', wintypes.WORD),
         ('Data3', wintypes.WORD),
         ('Data4', wintypes.BYTE * 8)
    ]
    def __init__(self, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8):
        """Create a new GUID."""
        self.Data1 = l
        self.Data2 = w1
        self.Data3 = w2
        self.Data4[:] = (b1, b2, b3, b4, b5, b6, b7, b8)
 
    def __repr__(self):
        b1, b2, b3, b4, b5, b6, b7, b8 = self.Data4
        return 'GUID(%x-%x-%x-%x%x%x%x%x%x%x%x)' % (
                   self.Data1, self.Data2, self.Data3, b1, b2, b3, b4, b5, b6, b7, b8)
 
# constants to be used according to the version on shell32
CSIDL_PROFILE = 40
FOLDERID_Profile = GUID(0x5E6C858F, 0x0E22, 0x4760, 0x9A, 0xFE, 0xEA, 0x33, 0x17, 0xB6, 0x71, 0x73)
 
def expand_user():
    # get the function that we can find from Vista up, not the one in XP
    get_folder_path = getattr(windll.shell32, 'SHGetKnownFolderPath', None)
 
    if get_folder_path is not None:
        # ok, we can use the new function which is recomended by the msdn
        ptr = ctypes.c_wchar_p()
        get_folder_path(ctypes.byref(FOLDERID_Profile), 0, 0, ctypes.byref(ptr))
        return ptr.value
    else:
        # use the deprecated one found in XP and on for compatibility reasons
       get_folder_path = getattr(windll.shell32, 'SHGetSpecialFolderPathW', None)
       buf = ctypes.create_unicode_buffer(300)
       get_folder_path(None, buf, CSIDL_PROFILE, False)
       return buf.value

The above code ensure that we only use SHGetFolderPathW when SHGetKnownFolderPathW is not available in the system. The reasoning for that is that SHGetFolderPathW is deprecated and new applications are encourage to use SHGetKnownFolderPathW.

A much better solution is to patch ntpath.py so that is something like what I propose for Ubuntu One. Does anyone know if this is fixed in Python 3? Shall I propose a fix?

PS: For ref I got the GUI value from here.