Skip to content

Instantly share code, notes, and snippets.

@nkeim
Created July 22, 2015 22:35
Show Gist options
  • Select an option

  • Save nkeim/5798b1211d52ed47993b to your computer and use it in GitHub Desktop.

Select an option

Save nkeim/5798b1211d52ed47993b to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import IPython"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'3.2.0-dev'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"IPython.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Steps to reproduce\n",
"\n",
"Browser: Safari 8.0.7"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/nfshome/nkeim/test_jupyter_bug/test_jupyter_utf8_bug\n"
]
}
],
"source": [
"!mkdir -p test_jupyter_utf8_bug\n",
"%cd test_jupyter_utf8_bug"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we need an \"extended\" character for testing. µ (\"MICRO SIGN\") is useful in the sciences, and it can be readily made on a Mac by pressing Option-m."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"181"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ord(u'µ')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we put a µ in the name of a file. We do *not* place it in a Unicode literal. It will wind up being encoded as a 2-byte UTF-8 sequence, which Python 2 will pass to the OS. Note that this fails in Python 3."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with open('1 µm.txt', 'w') as f:\n",
" f.write('')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['\\xc2', '\\xb5']"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# It's UTF-8\n",
"list('µ')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create other files for good measure\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!touch '0 mm.txt' '2 nm.txt'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we invoke `ls`, the UTF-8 sequence is handled gracefully:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 mm.txt 1 ??m.txt 2 nm.txt \u001b[0m\u001b[01;34mtest_jupyter_utf8_bug\u001b[0m/\r\n"
]
}
],
"source": [
"ls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(Coincidentally some software, such as the Mac Terminal.app (with appropriate font), renders the µ correctly.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem\n",
"\n",
"If you now return to the tree listing/dashboard, you'll see that the file listing is abruptly terminated right where the µ should appear. Files below it are not listed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment