python3使用requests编码异常

[toc]

python3编码

背景

requests请求时出现以下异常:

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('你好') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

原因

  • requests中默认使用Latin-1编码传输数据,在发送请求前会encode('latin-1')
        if isinstance(body, str):
            # RFC 2616 Section 3.7.1 says that text default has a
            # default charset of iso-8859-1.
            body = _encode(body, 'body')

def _encode(data, name='data'):
    """Call data.encode("latin-1") but show a better error message."""
    try:
        return data.encode("latin-1")
    except UnicodeEncodeError as err:
        raise UnicodeEncodeError(
            err.encoding,
            err.object,
            err.start,
            err.end,
            "%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
            "if you want to send it encoded in UTF-8." %
            (name.title(), data[err.start:err.end], name)) from None
  • 数据传输格式Json

unicode的是能够直接编码成latin-1格式的,但是如果其中含中文则无法编码

  • json.dumps的ensure_ascii

该参数指的是如果含非ascii则保留原样

If ``ensure_ascii`` is false, then the return value can contain non-ASCIIcharacters if they appear in strings contained in ``obj``. Otherwise, allsuch characters are escaped in JSON strings.
json.dumps('你好')
Out[26]: '"\\u4f60\\u597d"'
json.dumps('你好', ensure_ascii=False)
Out[27]: '"你好"'

所以含中文a编码结果如下

# false
json.dumps(a, ensure_ascii=False).encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-2: ordinal not in range(256)

# true
json.dumps(a).encode('latin-1')
b'"\\u4f60\\u597d"'

结论

  • 使用json格式化时不要随意使用ensure_ascii
  • requests请求时body默认使用latin-1编码

本文作者:朝圣

本文链接:www.zh-noone.cn/2019/12/python3编码

版权声明:本博客所有文章除特别声明外,均采用CC BY-NC-SA 3.0许可协议。转载请注明出处!

mysql分组取最新一条记录
0 条评论