关于网络传输序列化这件小事(涵盖主流的序列化分析)

序列化
如何选择序列化协议
常用的序列化

JDK原生的序列化方式
JSON序列化
Protobuf
Thrift
Hessian
其他非主流序列化方式
总结
参考

序列化

首先我们还是简单的聊聊为什么需要序列化这种东西。其实在一些网络通信中，我们知道网络传输的数据必须是二进制数据，但是在Java中我们方法的入参和出参都是对象,对象又不能直接在网络中进行传输。所以便有了序列化这玩意，将对象序列化为二进制数据然后在网络中进行传输，但是序列化和一般的其他算法不同，他需要是可逆的，因为我们网络传输完，接受到的请求的一段还需要将数据反序列化为对象进行处理。这个过程我们用图表示如下:

总的来说就是网络上只能传输二进制文件，而我们程序需要处理的又是对象。所以就有了序列化和反序列化。只要是网络传输一般都是会涉及到序列化。

总的来说比较官方的定义就是:

序列化：将对象写入到IO流中
反序列化：从IO流中恢复对象

如何选择序列化协议

没有最好的协议，只有最合适的。选择序列化协议我们需要从如下几个标准去评估

序列化后的数据大小，原则上说，序列化后的数据越小，传输效率越高，也就是性能越高
序列化后的数据可读性，理论上来说可读性越强，意味着可维护性越强，但是性能可能也就越低

所以性能和可读性很多时候是二者不可兼得

常用的序列化

JDK原生的序列化方式

原生的jdk序列化方式很简单，只需要实现Serializable接口即可。

简单使用

我们使用简单的例子来看看

@Data
public class Student implements Serializable {

 private String name;
 private Integer age;

 public Student(String name, Integer age) {
  this.name = name;
  this.age = age;
 }

 @Override
 public String toString() {
  return "Student{" +
    "name='" + name + ''' +
    ", age=" + age +
    '}';
 }

 public static void main(String[] args) throws Exception {
  // 写入文件 序列化
  String home = System.getProperty("user.home");
  String basePath = home + "/Desktop";
  FileOutputStream fos = new FileOutputStream(basePath + "student.dat");
  Student student = new Student("小奏技术", 18);

  ObjectOutputStream oos = new ObjectOutputStream(fos);
  oos.writeObject(student);
  oos.flush();
  oos.close();

  // 读取文件 反序列化
  FileInputStream fis = new FileInputStream(basePath + "student.dat");
  ObjectInputStream ois = new ObjectInputStream(fis);
  Student readStudent = (Student) ois.readObject();
  System.out.println(readStudent.toString());
  ois.close();
  fis.close();

 }
}

运行结果

优点

非常简单

缺点

无法跨语言现在很多系统的复杂度很高，采用多种语言来编码，而Java序列化目前只支持Java语言实现的框架，其它语言大部分都没有使用Java的序列化框架，也没有实现Java序列化这套协议，因此，如果两个基于不同语言编写的应用程序之间通信，使用Java序列化，则无法实现两个应用服务之间传输对象的序列化和反序列化
易被攻击 Java官网安全编码指导方针里有说明，“对于不信任数据的反序列化，从本质上来说是危险的，应该避免“。可见Java序列化并不是安全的。我们知道对象是通过在 ObjectInputStream 上调用 readObject() 方法进行反序列化的，这个方法其实是一个神奇的构造器，它可以将类路径上几乎所有实现了 Serializable 接口的对象都实例化。这也就意味着，在反序列化字节流的过程中，该方法可以执行任意类型的代码，这是非常危险的
序列化后的流太大序列化后的二进制流大小能体现序列化的性能。序列化后的二进制数组越大，占用的存储空间就越多，存储硬件的成本就越高。如果我们是进行网络传输，则占用的带宽就更多，这时就会影响到系统的吞吐量。

JSON序列化

json是我们最熟悉也是最常用的一种序列化方式，他以Key-Value方式存储，没有数据类型，是一种文本序列化框架。目前网络上主流的JSON序列化框架大致有如下几种

阿里开源的fastjson
阿里正在重构的fastjson2
谷歌开源的gson
Jackson

简单实用

使用也非常简单,如果是fastjson

序列化

     Student student = new Student("小奏技术", 18);
  String jsonString = JSON.toJSONString(student);

反序列化

  Student student1 = JSON.parseObject(jsonString, Student.class);

优点

可读性非常好
性能也还不错

性能不错是指一般够用，如果需要较高性能还是需要选用其他序列化方式

缺点

JSON没有类型，像Java这种强语言类型还是需要通过反射区统一进行属性注入,性能还是消耗较多
JSON序列化的额外空间开销也是比较大

Protobuf

Protobuf是Google提出的一种数据交换的格式，是一套类似JSON或者XML的数据传输格式和规范，用于不同应用或进程之间进行通信，支持主流的编程语言如 Java、Python、C++、Go 等。Protobuf使用的时候需要定义 IDL（Interface description language），然后使用不同语言的 IDL编译器，生成序列化工具类

简单实用

1. 添加maven依赖

 <properties>
        <protobuf.version>3.13.0</protobuf.version>
    </properties>

    <dependency>
            <groupId>com.google.protobuf</groupId>
            <artifactId>protobuf-java</artifactId>
            <version>${protobuf.version}</version>
        </dependency>

2. 添加maven protobuf 代码编译插件

   <plugins>
            <!--protobuf-->
            <plugin>
                <groupId>org.xolstice.maven.plugins</groupId>
                <artifactId>protobuf-maven-plugin</artifactId>
                <version>0.6.1</version>
                <configuration>
                    <protocArtifact>com.google.protobuf:protoc:${protobuf.version}:exe:${os.detected.classifier}</protocArtifact>
                    <pluginId>grpc-java</pluginId>
                    <protoSourceRoot>src/main/resources/proto</protoSourceRoot>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>compile-custom</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

        </plugins>

3. 编写proto文件

syntax = "proto3";  // 指定使用 proto3 语法

option java_package = "com.xiaozou";     // 指定包名
option java_outer_classname = "StudentProtobuf"; // 指定类名

message StudentMsg {
  
  // 类型名称 = 编号
  int32 age = 1;
  string  name = 2;
  
}

message Result  {
  
}

4. 生成proto java 代码

这里直接运行protobuf 插件

生成的代码在target目录下

我们简单看看内容

5. 测试

@Test
 public void testProtobuf() throws Exception{
  StudentProtobuf.StudentMsg.Builder builder = StudentProtobuf.StudentMsg.newBuilder();
  builder.setAge(12);
  builder.setName("小奏技术");
  StudentProtobuf.StudentMsg studentMsg = builder.build();
  // 序列化为 byte数组
  byte[] bytes = studentMsg.toByteArray();
  System.out.println("-----Student Byte start -------");
  for (byte b : bytes) {
   System.out.print(b);
  }
  System.out.println();
  System.out.println("-----Student Byte end -------");
  // 将 bytes 数组序列化为 java 对象
  StudentProtobuf.StudentMsg p2 = StudentProtobuf.StudentMsg.parseFrom(bytes);
  System.out.println("student age:" + p2.getAge());
  System.out.println("student name:" + p2.getName());

 }

优点

序列化反序列化速度很快，不需要通过反射获取类型，效率非常高
语言无关、平台无关、扩展性好

缺点

可读性差

Thrift

Thrift是Facebook开源提供的一个高性能，轻量级RPC服务框架，其产生正是为了满足当前大数据量、分布式、跨语言、跨平台数据通讯的需求。但是，Thrift并不仅仅是序列化协议，而是一个RPC框架。相对于JSON和XML而言，Thrift在空间开销和解析性能上有了比较大的提升，对于对性能要求比较高的分布式系统，它是一个优秀的RPC解决方案；但是由于Thrift的序列化被嵌入到Thrift框架里面，Thrift框架本身并没有透出序列化和反序列化接口，这导致其很难和其他传输层协议共同使用（例如HTTP）。目前Thrift已经捐赠给了apache

由于以上特性导致Thrift在其他框架或者场景中应用并不算多，所以之类不作过多介绍，仅了解

Hessian

Hessian 是动态类型、二进制、紧凑的，并且可跨语言移植的一种序列化框架。性能比json、xml还是要高出不少的

简单使用

引入依赖

     <dependency>
            <groupId>com.caucho</groupId>
            <artifactId>hessian</artifactId>
            <version>4.0.65</version>
        </dependency>

@Test
 public void testHessian() throws Exception {
  Student student = new Student("小奏技术", 18);
  //student对象转化为byte数组
  ByteArrayOutputStream bos = new ByteArrayOutputStream();
  Hessian2Output output = new Hessian2Output(bos);
  output.writeObject(student);
  output.flushBuffer();
  byte[] data = bos.toByteArray();
  bos.close();
  // 将byte数组转化为student对象
  ByteArrayInputStream bis = new ByteArrayInputStream(data);
  Hessian2Input input = new Hessian2Input(bis);
  Student deStudent = (Student) input.readObject();
  input.close();
  System.out.println(deStudent);
 }