[toc]
python3编码
背景
requests请求时出现以下异常:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('你好') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
原因
- requests中默认使用Latin-1编码传输数据,在发送请求前会encode('latin-1')
if isinstance(body, str):
# RFC 2616 Section 3.7.1 says that text default has a
# default charset of iso-8859-1.
body = _encode(body, 'body')
def _encode(data, name='data'):
"""Call data.encode("latin-1") but show a better error message."""
try:
return data.encode("latin-1")
except UnicodeEncodeError as err:
raise UnicodeEncodeError(
err.encoding,
err.object,
err.start,
err.end,
"%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
"if you want to send it encoded in UTF-8." %
(name.title(), data[err.start:err.end], name)) from None
- 数据传输格式Json
unicode的是能够直接编码成latin-1格式的,但是如果其中含中文则无法编码
- json.dumps的ensure_ascii
该参数指的是如果含非ascii则保留原样
If ``ensure_ascii`` is false, then the return value can contain non-ASCIIcharacters if they appear in strings contained in ``obj``. Otherwise, allsuch characters are escaped in JSON strings.
json.dumps('你好')
Out[26]: '"\\u4f60\\u597d"'
json.dumps('你好', ensure_ascii=False)
Out[27]: '"你好"'
所以含中文a编码结果如下
# false
json.dumps(a, ensure_ascii=False).encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-2: ordinal not in range(256)
# true
json.dumps(a).encode('latin-1')
b'"\\u4f60\\u597d"'
结论
- 使用json格式化时不要随意使用ensure_ascii
- requests请求时body默认使用latin-1编码