Seldon has collaborated with the NVIDIA Triton Server Project and the KServe Project to create a new ML inference protocol. The core idea behind this joint effort is that this new protocol will become the standard inference protocol and will be used across multiple inference services.

详细见： [Predict Protocol - Version 2][2]

对标方案 - Run computer vision inference on large videos

Real-time Inference

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements.

TensorRT through NVIDA Triton

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

Tutorial: https://github.com/NVIDIA/TensorRT/blob/main/quickstart/SemanticSegmentation/tutorial-runtime.ipynb

Load TensorRT Engine
Run inference

TensorRT to Triton: https://github.com/NVIDIA/TensorRT/tree/main/quickstart/deploy_to_triton

Asynchronous inference

An Example for SageMaker: Amazon SageMaker Processing Video2frame Model Inference [3]

end-to-end flow with Asynchronous inference endpoint

Prerequisites

实操方式： https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-create-endpoint-prerequisites.html

Create an IAM role for Amazon SageMaker.
Add Amazon SageMaker, Amazon S3 and Amazon SNS Permissions to your IAM Role.
Upload your inference data (e.g., machine learning model, sample data) to Amazon S3.
Select a prebuilt Docker inference image or create your own Inference Docker Image.
- Use Your Own Inference Code
Create an Amazon SNS topic (optional)
- Check Your S3 Bucket: https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-check-predictions.html

Use Your Own Inference Code with Hosting Services

https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-main.html，

how Amazon SageMaker interacts with a Docker container that runs your own inference code for hosting services.

How SageMaker Loads Your Model Artifacts

How Containers Serve Requests:

Containers need to implement a web server that responds to /invocations and /ping on port 8080.

How Your Container Should Respond to Inference Requests:

A customer’s model containers must respond to requests within 60 seconds.

Adapting Your Own Inference Container

https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html

详细介好了基于 sagemaker 来自定义的推理过程。 https://sagemaker-examples.readthedocs.io/en/latest/frameworks/pytorch/get_started_mnist_deploy.html

Create

Create a model in SageMaker with CreateModel.
Create an endpoint configuration with CreateEndpointConfig.
Create an HTTPS endpoint with CreateEndpoint.

Prebuilt SageMaker Docker Images for Deep Learning

https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html

https://aws.amazon.com/cn/blogs/machine-learning/bring-your-own-pre-trained-mxnet-or-tensorflow-models-into-amazon-sagemaker/

Entire process:

Step 1: Model definitions are written in a framework of choice.
Step 2: The model is trained in that framework.
Step 3: The model is exported and model artifacts that can be understood by Amazon SageMaker are created.
Step 4: Model artifacts are uploaded to an Amazon S3 bucket.
Step 5: Using the model definitions, artifacts, and the Amazon SageMaker Python SDK, a SageMaker model is created.
Step 6: The SageMaker model is deployed as an endpoint.

TensorFlow BYOM: https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/tensorflow_iris_byom

MXNet BYOM: https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/mxnet_mnist_byom

Pytorch: https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/pytorch_bring_your_own_gan

Batch Transform

Serverless Inference

Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts.

对标方案 - Machine Learning Platform for AI - EAS

一健部署
高性能
蓝绿部署
弹性扩缩
编译优化

参考

·End·

1.2 - 云原生模式 CLOUD NATIVE PATTERNS

随着底层基础设施的云原生的流行，对上层应用的开发模式带来了变化：

Refs

Cloud Native Pattern: https://github.com/ContainerSolutions/cloud-native-patterns
Cornelia Davis: Cloud Native Patterns_ Designing change-tolerant software.pdf
Pini Reznik, Jamie Dobson & Michelle Gienow: cloud-native-transformation-practical-patterns-for-innovation

·End·

1.3 - 消息队列资料汇总

汇总收集到的关于 MQ 方面的资料，包括 RabbitMQ、Kafka、Pulsar等

基础概念

模式	说明
简单原型	生产者往消息队列中扔消息，消费者从消息队列取消息，一次取一个。缺点是局域网内最大消费速度是2000QPS
批量处理消息	每次可以取多条消息处理
多个 Consumer 处理消息 - Push 模式	push模式很难适应消费速率不同的消费者，因为消息发送速率是由broker决定的。push模式的目标是尽可能以最快速度传递消息，但是这样很容易造成consumer来不及处理消息，典型的表现就是拒绝服务以及网络拥塞。中心节点需要监听Consumer 的 ACK 消息用于判断消息是否处理成功，并处理当前的消息。
多个 Consumer 处理消息 - Pull 模式	而pull模式则可以根据consumer的消费能力以适当的速率消费消息。这种模式好处，消费者不需要返回 ACK 信息，因为当消费者申请消费下一条消息可以认为上一条领取的消息已经处理完，也不需要处理超时的问题，Consumer 愿意处理到啥时候就到啥时候。如何保证多个 consumer 处理的消息不会重复了？
多个消息队列 - Pull 模式	kafka模式点击查看图片

详细查看：流式计算 - kafka(1)

案例分析

知乎千万级高性能长连接网关揭秘

我们怎么设计通信协议
- 业务解耦
  - 基于经典的发布订阅模式
  - 传输的消息是纯二进制数据，网关也无需关心业务方的具体协议规范和序列化方式。
- 权限设计
  - 基于回调的鉴权
  - Topic 模板变量
- 消息可靠性保证
  1. 回执和重传
  2. 基于消息队列的接收和发送方式
我们怎么设计系统架构
我们如何构建长连接网关
- 接入层
  - 负载均衡
    - 为什么不用IP Hash：分布不均匀和不能准确的标识客户端。
    - 基于七层负载均衡，使用 Nginx 的 Preread机制。实现基于客户端的唯一标识来进行一致性 Hash
- 订阅
- 发布
- 会话
- 持久化
- 滑动窗口

基于 Flink 的资讯场景实时数仓

基于Flink的资讯场景实时数仓

如何将 Kafka 和 Flink 进行整合，通过消息队列 Kafka 版和实时计算 Flink 实现实时 ETL 和数据流。

参考资料

[1] Alicloud: 基于Flink的资讯场景实时数仓

·End·

1.4 - Prometheus Practise

背景

为什么要自建 Promethues, 云服务商提供的挺好的。

名词解释

刚接触, 还是很多新词汇

名词	解释
Exportor	用于向Prometheus Server暴露数据采集的endpoint，Prometheus轮训这些Exporter采集并且保存数据；
ServiceMonitor	a ServiceMonitor describes the set of targets to be monitored by prometheus.
Prometheus Operator	简单运行 Promethues 在 kubernetes 上，并保持 Kubenetes 本土化的配置选项。Prometheus operator design

Operator

An Operator is an application-specific controller that extends the kubernetes API to create , configure, and manage instances of complex stateful applications on behalf of a kubernetes user.

it builds upon the basic kubernetes resource and controller concepts but includes domain or application-specific knowledge to automate common taks.

An Operator is software that encodes this domain knowledge and extends the kubernetes API through the third party resources mechanism, enabling users to create, configure, and manage applications.

Operator 与 Controller 区别在于：
a Controller with the following characteristics qualify as an operator :
1. Contains workload-specific knowledge
2. Manages workload lifecycle
3. Offers a CRD

Prometheus

Monitoring Stacks consist of a collector, a time-series database to store metrics, and a visualization layer.

A popular open-source stack is Prometheus, used along with Grafana as the visualization tool to create rich dashboards.

Reference Prometheus architecture.

设备插件 device-plugin

Kubernates Device Plugin 为 Kubernetes 提供了一个设备插件框架，你可以用它来将硬件资源发布到 Kubelet。好处有：

供应商可以实现设备插件，手动部署或者 DaemonSet 来部署，而不必定制 Kubernetes 本身的代码。目标设备可以是 GPU、高性能的 NIC、 FPGA、 InfiniBand 适配器。

GPU Telemetry using dcgm-exporter in Kubernetes

上图说明一个问题：通过 device-plugin 监控里的 pod-resources socket, 确定跟某一个 pod 相关联的设备信息。

监控指标

GPU Grafana 面板配置： GPU NODES V2

指标	说明
DCGM_FI_DEV_FB_USED	GPU 已用显存
DCGM_FI_DEV_FB_FREE	GPU 未用显存
DCGM_FI_DEV_GPU_UTIL	GPU 使用率
DCGM_MEM_COPY_UTILIZATION	见内存利用率对比

开始

使用 dcgmproftester

Generating a load

有感

K8s CRD 无处不在

FAQ

DCGM_FI_DEV_FB_FREE 数值和实际的内存对不上的？
- A: 使用的型号是T4系列，具体的型号是 ecs.gn6i-c16g1.4xlarge, GPU 显存为 16G，系统内存为 62G。而 NVIDIA dcgm-exporter 中监控的指标 DCGM_FI_DEV_FB_USED 为 GPU 显存大小，系统的内存通过 node_memory_MemTotal_bytes、node_memory_Buffers_bytes、node_memory_Cached_bytes、node_memory_MemFree_bytes 指标（来自 GPU NODES V2 ) 来监控.
- A: 当在 GPU 环境下提到内存时，须要区分下说的是 GPU 显存还是系统内存。
DCGM_MEM_COPY_UTILIZATION 内存利用率，内存利用率对比
- Utilization = time over the past sample period / global (device) memory was being read or writted * 100%

参考

Monitoring GPUs in Kubernetes with DCGM

Introducing Operators : Putting Operational Knowledge into Software .

Promethues Operator User Guide

Controllers and Operators, https://octetz.com/docs/2019/2019-10-13-controllers-and-operators/

Promethues Book 中文

NAIDIA GPU monitoring tools on Linux

Integrating GPU Telemetry into Kubernetes

Promethues 动态发现 Target 和 Relabel 的应用

GPU-Nodes-Metrics 12027 设置

·End·

1.5 - Notes for Patterns of Enterprise Application

数据源架构模式

表数据入口 (Table Data GateWay)

表数据入口包含了用于访问单个表或视图的所有SQL，如选择、插入、更新、删除等。其他代码调用它的方法来实现所有与数据库的交互.

每个方法都将输入参与映射为一个SQL调用并在数据库连接上执行该语句。由于表数据入口用于数据读写，因此通常是无状态的。

行数据入口 (Row Data GateWay)

充当数据源中单条记录入口的对象。每行一个对象。

活动记录 (Active Record)

一个对象，它包装数据库表或视图中某一行，封装数据库访问，并在这些数据上增加领域逻辑。

本质是一个领域模型

Pros

容易创建、易于理解

Cons

要求对象的设计和数据库的设计紧耦合
业务逻辑复杂时。

数据映射器(Data Mapper)

在保持对象和数据库（以及映射器本身）彼此独立的情况下，在二者之间移动数据的一个映射层。

对象和关系数据库用来组织数据的机制不同。对象的很多部分（如集合和继承）在关系数据库中不存在。

处理查找方法

分离接口解决这一难题：从领域对象到数据映射器的依赖关系。利用领域代码把所有需要查找方法放到一个可以置于领域包中的接口类中。

在一个包中定义接口，而在另一个与这个包分离的包中实现这个接口。

把数据映射到领域对象的域

映射器需要访问领域对象中的域（属性）。这往往是个问题，因为需要一些公共方法支持领域逻辑不需要的映射器。???

基于元数据的映射

如何将领域对象中的域映射到数据库列的信息。

用显式代码实现，
把元数据作为数据存储在类或单独的文件中。这就是元数据映射。
- 好处是映射器的所有变化通过数据处理，而不用更多的源代码，也不用代码生成或者反射程序。

Pros

解耦数据库和领域对象

Cons

引入新的层次。

分布模式

远程模式（Remote Facade)

为细粒度对象提供粗粒度的外观来改进网络上的效率

一个远程外观是一个粗粒度的外观(facade)，它建立在大量的细粒度对象之上，所以细粒度对象都没有远程接口，并且远程外观不包括领域逻辑。远程外观所要完成功能是把粗粒度的方法转化到底层的细粒度对象上。

在粗粒度对象和细粒度对象之间的一层薄薄的皮肤。

@startuml
participant a as "an address facade"
participant b  as "an address" 
[-> a: getAddressData 
a -> b: getCity 
a -> b: getState
a -> b: getZip
@enduml

远程外观功能：

提供一个粗粒度的接口
提供安全检查
事务控制：开启一个事务，当做完许多工作之后提交事务。

远程外观没有领域逻辑

数据传输对象（Data Transfer Object）

一个为了减少方法调用次数而在进程间传输数据的对象。

数据传输对象的价值在于它允许你在一次调用中传输几部分的信息。

数据传输对象通常不仅仅包含一个服务器对象。

通常不能从领域模型中传输对象。原因有：

对象常常在复杂的 web 中连接起来，并且能够序列化的话也很难
你通常还不想在客户端看到领域对象类，应该从领域对象中传输一个简单格式的数据。

数据传输对象的常见格式有

记录集，它是一系列的表格记录。
集合数据结构

使用时机

当你需要在一个方法调用中在两个进程之间传输多个数据项，应使用数据传输对象模式。
做为不同软件层中各种组件的通用数据源

FAQ

是用单一数据传输对象来处理整个交互，还是用不同的数据传输对象来处理不同的请求。
是为请求房和发送方各自准备一个数据传输对象，还是用一个单一的数据传输对象来负责交付。

离线并发模式

乐观离线锁，悲观离线锁，粗粒度锁，隐含锁

乐观离线锁

通过检查在会话读取一条记录后，没有其他的会话修改该数据来保证数据的一致性。

悲观离线锁

每次只允许一个业务事务访问数据以防止并发业务事务中的冲突

运用机制

通过 3 步来实现悲观离线锁：决定需要使用哪种锁类型，构建一个锁管理对象，定义业务事务使用锁的过程。

锁类型有：

exclusive write lock 独占写锁
exclusive read lock 独占读锁
read / write lock 读写锁

分布式锁

分布式环境下锁的全局唯一资源，使请求串行化，实际表现互斥锁，解决业务幂等问题。

强一致性、服务本身高可用使最基本的需求，其他的比如支持自动续费，自动释放机制，高度抽象接入简单，可视化，可管理等。

基于 Redis 缓存的分布式锁^[1]
- 存在单点问题，一旦涉及到 redis 集群，就会出现重复加锁的情况。
- 基于超时时间无法续租问题，随机数(fencing token^[2])解决了锁被其他任务释放的问题，但是还是无法解决超时导致的锁释放的问题。Redission 采用了 Watch dog 模式来解决这个问题的，具体是后台开启一个线程，每隔一定的时间去检查该锁还有多久超时，然后给这个锁进行续租。
- 异步主从同步问题
基于存储层的可靠的解决方案，比如 zookeeper / ETCD

会话状态模式

客户端会话模式、服务器会话模式、数据库会话模式

当服务器会话状态也需要持久化时，服务器会话状态和数据库会话状态之间区别是： 是否将服务器会话状态中的数据转化为表格形式。

参考

[1] Somersames: Redis 实现的分布式锁是完美的吗？

[2] Martin Kleppmann: How to do distributed locking

·End·

1.6 - GPU 那些事儿

概念解释

MapReduce

为离线批处理服务的模型，分成 Map 任务分发和 Reduce 结果收集两个阶段，能以相同方式处理不同数据来源的大数据作业。

根据数据来源的不同和作业特点分为

批计算
流计算, 处理连续的大规模数据流，将无界的数据流划分成固定大小的有界批处理子集
内存计算, 中间数据需要大量的迭代处理，把Map任务中的中间数据保存到内存中。
图计算，有依赖关系的多个子任务的 MapReduce 迭代计算形式。
交互计算, 对计算延迟敏感，或是包含多种计算模式的复杂作业。

Spark 和 Flink

大数据处理框架，提供基于 MapReduce 模式的批处理、流计算、内存计算和图计算等多种计算模式，来处理不同输入形式的大数据作业。

Flink

虽然没有使用 Flink, 但是需要了解 Flink 能做什么。

事件驱动型应用
数据分析应用
数据管道应用

详细查看 Flink 应用场景

Spark

BSP

并行计算模型中，大同步并行模型，该模型可表示由多个超级步组成的计算过程。在每个超级步中各处理器执行局部计算，再完成点点数据同行，最后全局同步检查来确定所有处理器是否完成运算。

DOT 模型

在BSP基础上，研究者扩展可用于大数据计算计算模型。

采用矩阵形式化描述大数据处理的计算和通信行为. 将计算过程分为三个层次：数据层、操作层、转换层。

DOTA 模型

中科院徐志伟团队提出, 在原来的 DOT 模型基础上，增加聚合层（A-Layer）回合处理转移层的中间数据并且完成最终结果。

p-DOT 模型

该模型沿用 BSP 的思路把大数据计算任务视为 p 阶段的 DOT 模型。在多次迭代的计算阶段内，p-DOT 模型由数据层、计算层、通信层构成。

并行计算

来自《并行计算》

MPI

MPI 是一个跨语言的通信协议，支持高效方便的点对点、广播和组播。

MPI 属于 OSI 参考模型的第五层或者更高，他的实现可能通过传输层的 sockets 和 TCP 覆盖大部分的层。

MPI 标准不断演化，MPI-1 模型不包括共享内存概念。

MPI 有很多实现，例如 mpich 或者 openmpi

MPS

The Multi-Process Service (MPS) is an alternative, binary-compatible implementation of the CUDA Application Programming Interface (API).

The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs, to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler-based) GPUs.

Kuberentes下 GPU 资源的使用

在一个 1 GPU 上跑多个 Job?, 按照文章的思路是可以实现的。

社区讨论

TKE 上实现 GPU Share 从测试数据以及 GIGAStack 产品，这种方案在正式环境上run起来的

GPU Sharing Scheduler Extender in Kubernetes 阿里云上实现 GPU Share

GPU Sharing Scheduler Extender Now Supports Fine-Grained Kubernetes Clusters
GPU Sharing in Kubernetes 阿里云共享 GPU 的设计。

KubeShare - Share GPU between Pods in Kubernetes 学校试验，不确定在生产环境稳定运行。

Supporting MIG in Kubernetes k8s-device-plugin 从0.7.0版本开始，支持 MIG（Multi-Instance GPUs)

GPU 虚拟化方案

想解决 GPU 资源合理分配问题，先将设备虚拟化。

Kubernetes 多卡GPU使用和分析文章里将 GPU 节点先进行虚拟化，更改上报给 kubelet 的 DeviceID，以及在 kubelet 调用 Allocate() 请求时将虚拟 DeviceID 转化为对应的实际 DeviceID.

[ GPU 虚拟化技术（四） - GPU 分片虚拟化 ]（http://cloud.it168.com/a2018/0611/3208/000003208253.shtml?1）提到分片从两个维度来定义：

是对 GPU 在时间片段上的划分，一个物理 GPU 的计算 engine 在几个 vGPU 之间共享，而调度时间片一般都在 1ms - 10ms 左右，
是对 GPU 资源的划分，主要是指对 GPU 显存的划分，由于安全隔离的要求，每个 vGPU 独享分配给它的显存，不会与其他 vGPU 共享。

该文章提到更深入的 GPU 分片技术框架。

GPU 非虚拟化方案

todo

NVIDIA GPU OPERATOR

在 Prometheus Proactise之设备插件 device-plugin 提到 Kubernetes 下如何支持新的硬件。

Configuring and Managing nodes with these hardware resources require configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors.

The Nvidia GPU Operator uses the operator framework within kubernetes to automate the management of all NVIDIA software components needed to provison GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node labelling using GFD, DCGM based monitoring and others.

参考

[1] xtaohub.com: 一切靠自己的 MPI 框架

[2] stackoverflow: How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?

[3] Enward: NVIDIA MPS总结

[4] NVIDIA: 《CUDA_Multi_Process_Service_Overview》

NVIDIA GPU OPERATOR

深入了解 GPU 硬件架构及运行机制非常全面的介绍

并行计算在线课程，必读

论文：面向大数据复杂应用的 GPU 协同计算模型

Flink 应用场景

·End·

1.7 - PlantUML + Archimate 记录

背景

画好一张图(能表达清楚内容)不容易，画出漂亮的的一张图更难。好的工具对于完成这件事已经完成一半，剩下另外一半需要作图者发挥了。

工具

PlantUML, PlantUML能记录所有的变更，也能转化为漂亮的图型.
VS Code, 其中PlantUML的插件可以实时查看图形
Archimate, 一套能解决各种架构图的成熟的方案, 覆盖非常多的Case

详细

参考

ArchiMate Cookbook, 在Kanmi APP里阅读
ArchiMate 各种图介绍， https://www.hosiaisluoma.fi/blog/category/archimate/
《企业架构建模 - ArchiMate 语言》 https://www.slideshare.net/zhoujg/archimate
TOGAF 学习， https://www.cnblogs.com/zhoujg/
PlantUML 语法速查手册
PlantUML 参数设置手册

·End·

1.8 - Reliability Engineering 可用性工程

可用性工程到底指什么？当提到可用性建设的时候，其实是需要建设什么内容？

行业标杆

2019年的时候了解到Google SRE团队在建设的可靠性工程， 2019年百度对外公布其可用性工程建设的情况。

百度可用性功能建设

可用性的问题

可用性工程的需求

可用性工程技术标准

预防故障发生能力
程序代码、测试、变更规范
操作操作规范，操作审计
运营容量规划
防攻击
平台/第三方服务服务SLA
基础设施基础设施SLA
预防故障扩散能力
故障快速发现能力
故障定位止损能力
灾难恢复能力

落地

如何落地

参考

百度服务可用性工程建设, 形成了一套可用性功能技术标准。 https://www.infoq.cn/article/C4PddPgiGNFGTqD6pZAK

·End·

1.9 - About Cncf Projects

CNCF Project vs CNCF Member Project 这有什么区别？

Projects

OpenTelemetry

OpenTelemetry 在遇到以下无法解决的问题情况下出现了。

应用程序被锁定在特定解决方案的仪表中
针对开源软件的特定解决方案的仪表基本上是不可能的

于是需要设计一个可观测的系统来解决以上问题，在设计上需要满足基本的需求^[4]：

要求：独立的仪表(instrumentation，这个翻译总觉得怪异，)、遥测和分析
要求：零依赖性
要求：严格的后向兼容和长期支持

概念

信号 signal

不同类型的遥测，我们称之为信号，主要的信号是追踪

OpenTelemetry 是一个跨领域的关注点(cross-cutting concern)

Instrumentation

仪表（在《OpenTelemetry 可观测性的未来》这样翻译的）

但是，来自谷歌：the particular instruments used in a piece of music / measuring instruments regarded collectively.

OTLP

OpenTelemetry protocol (OTLP)，定义了 Open Telemetry 里 Tracing\Metrics\Logging 的 protobuf 的协议格式。比如 Tracing

Propagators and Context

Propagators: Used to serialize and deserialize specific parts of telemetry data such as span context and Baggage in Spans.

Traces can extend beyond a single process. This requires context propagation, a mechanism where identifiers for a trace are sent to remote processes.

otel.SetTextMapPropagator(propagation.TraceContext{})

TextMapPropagator injects values into and extracts values from carries as text.

Carrier

A carrier is the medium used by Propagators to read values from and write values to.

结合 Newrelic Tracing 的实践

OpenTelemetry and Newrelic 结合 , 中间通过 opentelemetry-go 来连接。

也可以通过 opentelemetry-collector (e.g. binary, sidecar, or daemonset). 方式来做。

结合 Alibaba Tracing Analysis（ARMS）的实践

What is Tracing Analysis，从 Architecture 看是支持 Opentracing Basing SDK

Tracing Analysis is compatible with SDKs from various open source communities and supports the OpenTracing standard.

Apache DolphinScheduler

A distributed and easy-to-extend visual workflow scheduler system, undergoing incubation at ASF.

MegaEase

开源、自主可控、低层本、高可用的 Cloud Native 平台

服务编排和服务治理
流量调度和流量管理
应用服务观测性 & DevOps
关键中间件运维及管理
基础资源调度

参考

OpenTelemetry-可观察性的新时代
CNCF 项目或者成员项目
OpenTelemetry: Propagators API
Ted Young，译者 Jimmy Song: OpenTelemetry可观察性指南
Uptrace: OpenTelemetry instrumentations for Go

·End·

1.10 - Microservice Arch 点点滴滴

image via: https://www.infoq.com/presentations/uber-microservices-distributed-tracing/

Pattern: microservice architecture

“An architectural style that structures an application as a set of deployable/executable units, a.k.a. services”

Highly maintainable and testable
Minimal lead time (time from commit to deploy)
Loosely coupled
Independently deployable
Implements a business capability
Owned/developed/tested/deployed by a small team

行业架构最佳实践

Best practices framework for Oracle Cloud Infrastructure TODO

行业架构分享

不断更新行业的一些架构分享，进行分析总结。

Designing loosely coupled services[Slides]

by Chris Richardson, 介绍了几种类型 coupling 及其缺点和如何设计 loosely coupled 微服务.

Runtime coupling，订单服务需要等待客户服务返回时才给出响应，减少了可用性

Design time coupling，当客户服务变化时，订单服务也跟着变化。减少了开发的独立性

Minimizing design time coupling

DRY
Consume as little as possible
Icebergs: expose as little as possible
Using a database-per-service

Reducing runtime coupling

Use resilience patterns for synchronous communication
Self-contained service
Improving availability: replace service with module
Use asynchronous messaging
Improving availability: sagas
Improving availability: move responsibility + CQRS

Avoiding infrastructure coupling

Use private infrastructure: minimizes resource contention and blast radius
Use “Private” message brokers
Fault isolated swim lanes

模式 PATTERNS

Data Management

Database per Microservice

CQRS

Event Sourcing

Materialized View Patterns

Generate prepopulated views over the data in one or more data stores when the data isn’t ideally formatted for required query operations. this can help support efficient querying and data extraction, and improve application performance.

Context and problem

选择存储数据的方式跟数据本身的格式、数据大小、数据完整性以及所使用的存储种类，但是，这样带来查询的不好的影响。比如当查询数据的子集时，必须取出所有的相关的数据，比如查询一些客户的订单概览

Solution

为了支持高效率的查询，通用的解决办法是，提前生成数据视图（materializes the data in a format suited to the required results set.）

Messaging

Design and Implementation

BFF

API GateWay

Strangler

Consumer-Driven Contract Tracing

Externalized Configuration

Facilitators

Facilitators^[5] are simple a new type that has access to the type you wished you had generic methods on. 比如，如果你是 ORM framework 的设计者，想提供一些查询表格的方法。你提供了一个中间类型（Querier），这个中间类型允许你写一些 generic querying functions。

Resilience Patterns - Sagas

Saga distributed transactions , a way to manage data consistency across microservices in distributed transaction scenarios。

Choreography
Orchestration.

Resilience Patterns - Circuit Breaker

" used to limit the amount of requests to a service based on configured thresholds – helping to prevent the service from being overloaded " – 断路器

同时，通过监控多少个请求失败了，来阻止其他的请求进入到服务里

CircuitBreaker 使用 sliding window 来存储和集合发生的请求。可以选择 count-based 也可以选择 time-based。

Circuit Breakers in Go Golang语言的实现.

Image via: https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker

Resilience Patterns - Bulkhead

" Isolates services and consumers via partitions “, 舱壁模式, 在航运领域，舱壁是船的一部分，合上舱口后可以保护船的其他部分。

SemophoreBulkhead, work well across a variety of threading and io models. it is based on a semaphore.
ThreadPoolBulkhead, uses a bounded queue and a fixed thread pool.

防止级联失败发生. 但对应用来说该模式增加了负担.

image via: https://www.jrebel.com/blog/microservices-resilience-patterns

什么时候使用：

Isolate resources used to consume a set of backend services, especially if the application can provide some level of functionality even when one of the services is not responding.
Isolate critical consumers from standard consumers
Prodect the application from cascading failures

Request_id

Better Logging Approach For Microservices request_id在日志中打印，由请求方生成发起

从What is the X-REQUEST-ID http header?说明来看，建议是client生成x-request-id.

Resilience Patterns - RateLimiter

限流的基础算法

漏桶算法
- 漏桶算法的实现往往依赖于队列，请求到达如果队列未满则直接放入队列，然后有一个处理器按照固定频率从队列头取出请求进行处理。如果请求量大，则会导致队列满，那么新来的请求就会被抛弃。

令牌桶算法
- 一个存放固定容量令牌的桶，按照固定速率往桶里添加令牌。桶中存放的令牌数有最大上限，超出之后就会被丢弃或拒绝。当流量或者网络请求到达时，每个请求都要获取一个令牌，如果能获取到，则直接处理，并且令牌桶删除一个令牌。如果获取不到，则该请求就要被限流，要么直接丢弃，要不再缓冲区等待。
- 长期来看，所限制的请求速率的平均值等于 rate（每秒向桶添加令牌的速率r）的值
- 实际请求达到的速率为 M，达到的最大速率为 M = b + r (其中b 为令牌桶的最大值)

参考

What is the X-REQUEST-ID http header? https://stackoverflow.com/questions/25433258/what-is-the-x-request-id-http-header)

Resilience4j is a fault tolerance library designed for Java8 and functional programming https://github.com/resilience4j/resilience4j

Azure Cloud Design Patterns, used in the cloud for building reliable, scalable, secure applications https://docs.microsoft.com/en-us/azure/architecture/patterns/index-patterns

·End·

.NET Microservices: Architecture for Containerized .NET Applications （PDF 在Kami 上）https://docs.microsoft.com/en-us/dotnet/architecture/microservices/

[5] JBD: Generics facilitators in Go

1.11 - Big Data Arch

背景

行业案例

腾讯游戏大数据应用

大数据背后的价值 - 腾讯游戏大数据应用

大数据落地应用 = 数据 + 系统 + 算法 + 应用场景
腾讯游戏用户数据分层体系
腾讯游戏数据处理系统架构

饿了么数据仓库治理及数据应用

大数据背后的价值 - 饿了么数据仓库治理及数据应用

数据仓库的建设

标准化和规范化

统一日志搜集框架

原则

主题划分
数据一致性
维度建设

TODO

数据权限管理
数据使用记录
数据开放平台

Uber Freight Carrier Metrics With Near-Real-Time Analytics

Uber Final System Design

Introduction
How We Did It
- Backend Requirement
- Potential Solution Considered
- Final System Design
- Data Schema
- Flink stateful Stream Process
- Hybrid Pinot Table
- Golang GRPC Service
Impact
Conclusion

通用方案

存储读取/写入

Data Lake 系列：关于 EMRFS S3 优化的提交程序，你了解吗文章与 FileOutputCommitter 进行了比较。

同时在 github repo s3committer 引出了 multi-part upload API 技术，可以用于处理大文件上传慢的问题。

但是对于小文件上传问题，是否可以就并发上传就行了呢？ No

方案：压缩上传，上传完成后通过 AWS Lambda 来解压缩。其中通过流（Stream）的方式解决 Only 500MB of disk space per instance 的限制^[1]，但执行时间有15分钟的限制，对于超大文件还是有。

参考

[1] John Paul Hayes: How to extract a HUGE zip file in an Amazon S3 bucket by using AWS Lambda and Python

·End·

1.12 - RPC vs Http

在评估 4G SDK 方案中，嵌入同事方案中使用 RPC 方案与服务器通信，让我感觉很奇怪，因为在我们的 web server 里通信一般都是 https 通信方式。

RPC

wikipedia’s List of network protocols (OSI model)

RPC 属于 Session Layer,

HTTP vs RPC

rpc 是远端过程调用，其调用协议通常包含传输协议和序列化协议。

传输协议包含：如著名的 [gRPC](grpc / grpc.io) 使用的 http2 协议，也有如 dubbo 一类的自定义报文的 tcp 协议。序列化协议包含：如基于文本编码的 xml json，也有二进制编码的 protobuf hessian 等。

HTTP 长连接

在参考^[2] 中提到 httpServer 怎么处理长连接的： httpServer 创建一个 goroutine，更确切的说，是为了为一个新的 tcp 连接去创建一个 goroutine，详细参考文章的源码。

Compare gRPC services with HTTP APIs

gRPC is designed for HTTP/2, vs HTTP 1.x:

binary framing and compression.
Multiplexing of multiple HTTP/2 calls over a single TCP connection. Multiplexing eliminates head-of-line-blocking

Feature	gRPC	HTTP APIs with JSON
Contract	Required (`.proto`)	Optional (OpenAPI)
Protocol	HTTP/2	HTTP
Payload	Protobuf (small, binary)	JSON (large, human readable)
Prescriptiveness	Strict specification	Loose. Any HTTP is valid.
Streaming	Client, server, bi-directional	Client, server
Browser support	No (requires grpc-web)	Yes
Security	Transport (TLS)	Transport (TLS)
Client code-generation	Yes	OpenAPI + third-party tooling

表格来自 https://docs.microsoft.com/en-us/aspnet/core/grpc/comparison

Head-of-line Blocking

详细描述了HOL https://engineering.cred.club/head-of-line-hol-blocking-in-http-1-and-http-2-50b24e9e3372

Key Points:

Frame
Message
Stream

The HOL Blocking issue is resolved at the HTTP layer in HTTP/2, but it now moves to the TCP layer.

HTTP/3 or QUIC solves HOL Blocking at TCP layer by leveraging UDP instead of TCP as the transport protocol.

参考

既然有 http，为什么还要 RPC 调用？ https://www.zhihu.com/question/41609070
Golang httpServer 对 KeepAlive 长连接的处理方式 https://blog.csdn.net/jeffrey11223/article/details/81222774

·End·

1.13 - GCP Study

问题

场景一 用户在客户端 A 上传了数据到数据处理服务 X，并对数据进行了处理，得到处理结果 用户在客户端 B 上能够获取数据处理服务 X的处理结果 ( 数据处理服务 X 对于此用户来说是透明的 )

场景二 数据运营管理平台，用户登录后在直接访问各个服务数据 API 的时候需要鉴权。

名词解释

API 表面 (API surface) API 的公共接口。API surface 包含各种方法，以及这些方法中使用的参数和返回类型。

Service Management Google Cloud 基础架构服务，创建并管理 API 和服务。

Extensible Service Proxy(ESP) 基于 NGINX 的服务代理，类似 Isto Service Mesh 方式。

Cloud endpoints 架构

它给出了 API 管理系统，通过 ESP 或 Endpoints Frameworks 提供比较可扩展服务代理和 Endpoints Frameworks

ESP 可扩展服务代理

endpoint-introduce

供给侧：

配置 endpoints: 在 OpenAPI 配置文件中描述 API Surface 并配置 Endpoints 功能（例如 API 密钥或者身份验证规则）
部署 endpoints 配置：定义的 API 后，使用 Cloud SDK 将其部署到 Service Managerment.
部署 API 后端：将 ESP 和 API 后端部署到受支持的 Google Cloud 后端，例如 Compoute Engine。ESP 会与 Endpoints 后端服务协同运作，以在运行时保护和监控您的 API。

Endpoints auchitect

组件：

ESP
Service Control
Cloud SDK
Google Cloud Console

K8S ESP

这张图更清晰了给出 endpoints 的架构

Endpoints Frameworks

如果需要开发一个基于 GCP 上的 Restful 服务，需要使用 Endpoint Frameworks, 它解决了什么问题？内置了一个 API 网关，拦截所有请求并执行所有必要的检查（例如身份验证），然后再将请求转发到 API 后端。后端响应后，会收集遥测数据并进行报告。

Endpoints Frameworks

如下功能都在 Endpoints Management 里面完成。

身份验证
API 密钥
监控
日志
配额
开发者门户
- 示例 Google cloud endpoints 从 openapi 文件解析并展示 API 页面。我们 API 平台可以参考下。

A python framework for building RESTful APIs on Google App Engine Cloud Endpoint for Go 已经 DEPRECATED，但是有借鉴的意义。

实践

选择身份验证方式在 endpoints 中实践身份验证。Cloud Endpoints 身份验证和 API 密钥

Identity and security

Authentication

None
API key
- Identifies your project using a simple API key to check quota（限额） and access
- API 密钥用于识别正在调用API的调用方项目（应用或网站），而身份验证令牌用于识别正在使用应用或者网站的用户（人员）
- API 密钥的用途
  - 项目识别 - 识别正在调用相应API的应用或项目
  - 项目授权 - 检查调用方应用是否拥有API的权限，以及是否已在其他项目中启用API
OAuth Client ID（以最终用户身份进行身份验证） requests user consent（同意） so your app can acces the user’s data
- 你需要代表应用的最终用户访问资源，如您的应用需要访问应用用户的 Google BigQuery 的数据集
- 你需要以用户身份而非作为你的应用进行身份认证。
- 两个用途
  - 用户身份验证 - 安全的验证调用方用户的真实的身份是否与宣称的一致
  - 用户授权 - 检查用户是否应具备发出此请求的权限
Service account Enables server-to-server, app-level authentication using robot account

GCP API 使用 OAuth 2.0 协议进行用户账号和服务账号的身份验证。 OAuth2.0 身份验证过程确定主账号和应用

用户账号作为 Google 账号进行管理
服务账号由 Cloud IAM 管理，代表非人类用户。

哪种适合，查看 Authentication strategies

API 密钥用于识别项目，身份验证用于识别用户

API密钥是不安全的；由于客户端通常可以访问API密钥，因此API密钥容易被他人窃取。密钥被窃取后，由于没有到期时间，因此可以无限期的使用，除非项目所有者撤销密钥或者重新生成密钥，API 密钥的安全性没有身份验证令牌高。

实践

Cloud EndPoints 快速入门实践文档。点击查看示例仓库地址案例代码

心得

当增加 API 密钥和限流的功能时，不需要更改后端服务任何代码
API 的监控通过 Google Cloud Console 来查看，非常方便。

分析

针对场景一

客户端有自己的业务后台服务，客户端 A 的业务后台服务 A，简称”后台 A“，客户端 B 则简称”后台 B“，那么后台 A、B 访问数据处理服务 X属于 server-to-server 的方式来进行身份认证的。缺点是应用级的鉴权。访问跨项目 BigQuery 数据集 Cross project management using service account

给后台 A 创建 Service-Account A, 并授予数据处理服务 X的资源创建、计算、查看角色
给后台 B 创建 Service-Account B, 并授予数据处理服务 X的资源查看、计算角色
~~问题是两个 service-account 之间资源理应是隔离的，没办法解决同一用户获得数据的问题。~~
在 Console 中将 Service-Account B 添加到后台 A，并赋予查看、计算角色

客户端没有自己的业务后台服务在 google cloud 里，Firebase 解决移动端用户身份验证 Firebase 用户认证和 IAM 的结合？

客户端 A 开发在同一登录 Congnito 中创建 Service-Account A, 并授予数据处理服务 X的资源创建、计算、查看角色
用户在客户端 A 使用 Congito 登录后，根据 Service-Account A 生成 STS Token，返回给客户端 A
客户端 A 使用 STS Token 直接访问数据处理服务 X，进行资源的创建，计算，查看
客户端 B 开发也在 Congnito 中创建 Service-Account B, 并授予数据处理服务 X的资源查看、计算角色
用户在客户端 B 使用 Congito 登录后，根据 Service-Account B 生成 STS Token，返回给客户端 B
客户端 B 使用 STS Token 直接访问数据处理服务 X，进行资源的创建，计算，查看用户访问的资源是隔离的。怎么处理？

针对场景二

使用 OAuth Client ID 方式登录 ”以最终用户身份进行身份验证“ 你的应用如何验证用户身份，Google Cloud 使用的 firebase 身份验证 以下步骤为实现无后台的数据运营平台。

完成服务 X 的上线，服务 X 需要接入数据运营平台，通过 IAM 并对用户 A 授予所有资源的查看权限。
用户 A 通过 OAuth2.0 登录后，提示他需要哪些服务的权限（Auth Scope），
进入”服务 X"，通过查询服务 X 的资源，并展示。

参考

Cloud EndPoints 简介 https://cloud.google.com/endpoints/docs/openapi/about-cloud-endpoints?hl=zh-cn
Cloud Endpoints 架构概览 https://cloud.google.com/endpoints/docs/images/endpoints_arch.png?hl=zh-cn
Cross project management using serivce account https://stackoverflow.com/questions/35479025/cross-project-management-using-service-account
以最终用户身份进行身份验证 https://cloud.google.com/docs/authentication/end-user
使用 firebase 在 App Engine 上对用户进行身份验证 https://cloud.google.com/appengine/docs/standard/python/authenticating-users-firebase-appengine#managing-user-data-in-datastore 例子比较简单，并没有结合 IAM 来实现权限控制。
IBM OAuth 2.0 工作流程 https://www.ibm.com/support/knowledgecenter/zh/SSPREK_9.0.2/com.ibm.isam.doc/config/concept/con_oauth20_workflow.html#con_oauth20_workflow
《OAuth 2.0 实战》百度网盘 -> 我的文档, 在Kami上阅读
[认证 & 授权] https://www.cnblogs.com/linianhui/category/929878.html
[OIDC in Action] 详细流程展示OIDC过程 https://www.cnblogs.com/linianhui/category/1121078.html
sample for oidc 上面文档的代码 https://github.com/linianhui/oidc.example
Identity Server 4 - Hybird Flow - MVC 客户端身份认证 https://www.cnblogs.com/cgzl/p/9253667.html https://www.cnblogs.com/cgzl/tag/OAuth2/

·End·

1.14 - 限流的那些事

背景

限制 API 的请求数量是网络安全的一部分，大量的 API 请求导致高负载。在学习 rudr 过程中也提到 rate limiting 可以做为 trait，那么使用起来非常方便，业务开发并不需要关心限流逻辑。

Glossary

traffic shaping

packet are delayed until they conform

traffic policing

non-conforming packets may be discarded(dropped) or may be reduced in priority.

What and The Importance

What Is API Rate Limiting

集群流控：集群流量不均匀导致总体限流效果不佳的问题，仅靠单机纬度去限制的话无法精准限制总体流量。

Best Practices For API Rate Limiting

How to Throttle API Calls

Three Methods of Implementing API Rate-Limiting

Adaptive System Protection （系统自适应限流）

Load 自适应
CPU usage
平均 RT
并发线程数
入口 QPS

Request Queues

Throttling

API Used by setting up a temporary state, allowing the API to assess each request

Rate-limiting Algorithms

Leaky Bucket

as a meter，与 Token bucket 算法互为 mirror，
- Token bucket 固定的 rate 增加 token；而另外一个是固定的 rate 漏水
- Token bucket 请求从桶中获取 token, 获取不到时则限流；而另外一个是往漏斗中滴水，滴满时则限流。
as a queue，用于匀速排队的方式严格控制请求通过的间隔时间。

Fixed Window

Sliding Log

Sliding Window

Rate Limiting 影响

参考

Everything You need to Know About API Rate Limiting https://nordicapis.com/everything-you-need-to-know-about-api-rate-limiting/
Sentinel: 集群流控，https://github.com/alibaba/Sentinel/wiki/集群流控

·End·

1.15 - 微服务架构下Deloyment最佳方式探讨

背景

服务的升级和重构上线，跟服务如何平滑的部署、流量迁移有着密不可分的关系。

参考

Flagger
Flagger is a Kubernetes operator that automates the promotion of canary deployments using Istio, Linkerd, App Mesh, Nginx, Contour or Gloo routing for traffic shifting and prometheus metrics for canary analysis.

·End·

1.16 - Cloud Computing Service Modeling

简介

2020年需要重新思考架构模型，我们以怎样的方式对外提供服务，是Service，一起看下有哪些Service吧

Service Model

常见的应该有Iaas(Host)、Paas(Build)、Saas(Consume)、Faas, 而在这边文章"Future of Cloud Computing Architectrue"里提到的Service更多类型

Cloud Computing Stack

每一种提供了不同的灵活性和控制，如下图：

Saas vs Paas vs Iaas Service Model

Function-as-a-service TODO

Software-as-a-Service

The Saas model 为你的业务提供基于云的web应用的访问能力，无须install new infrastructure

The Twelve-Factor APP

https://12factor.net/ 为提供Saas服务提供了方法论:

使用标准化流程自动配置
和操作系统之间尽可能的划清界限，在各个系统中提供最大的可移植性
适合部署在现代的云计算平台，从而在服务器和系统管理方面节省资源
将开发环境和生产环境的差异降至最低，并使用持续交付实施敏捷开发。
可以在工具、架构和开发流程不发生明显变化的前提下实现扩展。

身份认真和授权 TODO

License Model for Saas or alias “Sass License”, 感觉类似“AWS cognito”
vs IAM

文章展示了AWS各个服务之间的交互逻辑

Platform-as-a-service

With this model, a third-party vendor provides your business with a platform upon which your business can develop and run application.

Infrastructure-as-a-service

allow your business to have complete, scalable control over the management and customization of your infrastructure .

Patterns in microservice

architecture trends 2020

EDA

Service composition - anit pattern

tightly coupled, because the calling service needs to know the URL payload and related detail of the service it calls
a change in functionality require a coordinated effort between multiple teams

Event notifications and event-driven architectures

AsyncAPI #TODO

Data Architecture

Data Mesh

the next enterprise data platform architecture is in the convergence of Distributed Domain Driven Architecture, Self-serve Platform Design, and Product Thinking with Data^[5]. — Zhamak Dehghani

Data Gateways

somewhat like API gateways but focus on the data aspect.

Policy as Code #TODO

Designing for ___

Designing for resilience

Designing for abservability

Designing for portability

whether that’s for multi-cloud or hybrid-cloud. In most cases, there are no reasons for architects to design for the lowest common denominator to enable true multi-cloud portability or avoiding vendor lock-in.

Designing for sustainability

This is emerging because people are realizing the software industry is responsible for a level of carbon usage comparable to the aviation industry^[6].

Dapr

It is describing as a set of"microservice building blocks for cloud and edge" also is meant to be agnostic

Dapr is completely platform agnostic, meaning you can run your applications locally, on any Kubernetes cluster, and other hosting environments that Dapr integrates with. This enables developers to build microservice applications that can run on both the cloud and edge with no code changes,"

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

Dapr building blocks

Service Invocation
State management
Plubish and subscribe messaging between services
Event driven resource bindings
Virtual actors – A pattern for stateless and stateful objects that make concurrency simple with method and state encapsulation. Dapr provides many capabilities in its virtual actor runtime including concurrency, state, life-cycle management for actor activation/deactivation and timers and reminders to wake up actors^[7].
Distributed tracing between services
Resillency

Sidecar architecture and supported infrastructures

Dapr exposes its APIs as a sidecar architecture, either as a container or as a process, not requiring the application code to include any Dapr runtime code.

Dapr running as a side-car process

Multi-Cloud, open components (bindings, pub-sub, state) from Azure, AWS, GCP

Dapr is completely platform agnostic, meaning you can run your applications locally, on any Kubernetes cluster, and other hosting environments that Dapr integrates with. This enables developers to build microservice applications that can run on both the cloud and edge with no code changes.

Dapr 和 Service-Mesh 的区别

service-mesh vs dapr

共同点：

基于 mTLS 加密的服务到服务的安全通信
服务到服务的度量指标收集
服务度到服务的分布式跟踪
故障重试恢复能力

Dapr 以开发者为中心，提供了通过了名称进行服务发现和调用的方式。Dapr 还提供了其他应用级的构建块，如状态管理、发布/订阅、参与者等

Principles in Microservice

Create an organizational model that provide independence and antonomy to teams
services are independently deployable
services are independently scalable
they do not have a single point of failure - only degradation
the design employ asynchronous communication between services
no shared functionality , code or data exists in the system .
Component are easy to understand and the are small services with boundary

思考

接触到“AWS解决方案架构师”，负责企业客户应用在AWS的架构咨询和设计。在微服务架构设计，数据库等领域有丰富的经验。
是技术产品还是技术架构师呢？那AWS这些云产品由什么位置来规划的？
是技术架构师又偏技术业务，这是云服务架构师的之路, 最后做技术架构咨询。

TODO

[ ] 企业软件架构模式，见kami app

参考

[1] Futrue of Cloud Computing Architecture.pdf

[2] IBM: IaaS vs. PaaS vs. SaaS, Understand and compare the three most popular cloud computing service models

[3] InfoQ: Software Architecture and Design InfoQ Trends Report—April 2020

[4] InfoQ 趋势报告：架构和设计领域技术演变详解 2019

[5] Zhamak Dehghani: How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh

[6] Thomas Betts, Holly Cummins: Software Architecture and Design InfoQ Trends Report – April 2021

[7] Microsoft Open Source Blog: Announcing Distributed Application Runtime (Dapr), an open source project to make it easier for every developer to build microservice applications

·End·

1.17 - 控制资源的访问权限

这是一个最好的时代，也是一个最焦虑的时代，如果你停滞不前，时代抛弃你的时候，连一句再见都不会说。

What

Identity & Access Mangement, 身份验证以及访问控制，一种对资源提供可控安全的访问解决方案。用来控制对 AWS 资源的访问权限，把资源 (Resouuce) 上的操作 (Action) 授权给谁 (identity)

IAM: https://docs.aws.amazon.com/zh_cn/IAM/latest/UserGuide/intro-structure.html

图中三种 Authorization，看完这个链接，不止 three type :

Identity-based polices To provide your users with permission to access the AWS resource in their own account
- Managed policy （托管策略）
  - AWS Managed policy,
  - Customer managed policy,
- inline policy（内联策略）: embedded in an IAM identity(a user, group, or role)
Resource-based polices Popular for granting cross-account access, Resource-based policies are inline only, not managed
Other polices Should be used carefully

Q: 这三种跟 Policy 的三种类型有什么区别？ A: 下面三种隶属于 Identity-based policy, 详细见：Identity-Based Policies

How

如何优雅的定义 Resource
如何优雅的定义 Action
Identity
Policy

Resource

AWS 给出的答案是： ARN（AWS Resource Namespace), ARN 是一个命名规则，用于无歧义的对 AWS 的资源进行命名。

AWS 针对调用 API 的许可控制。对 API 许可的资源格式定义非常全面，值得借鉴

Identity

User, Group, Roles

Action

Action 也就是针对 AWS 上的服务提供的 API。 Condition Context Keys，是 AWS IAM 中支持的一个功能。即在定义 Policy 时可以使用一些变量，支持复杂的表达式。

Policy TODO

Policies Type, 详细描述有哪些 Policy 种类，几乎覆盖大部分的场景

Identity-based policies
Resource-based policies
Permissions boundaries
Organizations SCPS
Accss control lists(ACLs)
Session Policies

用来描述授权策略的一种描述语言，用于描述谁在 xx 条件下对 xx 资源具有 xx 操作。组成如下：

“Version”
Statement：具体策略的内容，可以是一个或者多个
- Effect: Allow 或者 Deny
- Action: 具体操作，参见 AWS Service Actions and Condition Context Keys for Use in IAM Policies.
- Resource: 具体的资源

一个 identity 所有的多个 Policy 会发生冲突，IAM 采用的策略可以概括为 8 个字： ~凡事声明，一票否决~

凡事声明：默认情况下， Resource 是禁止访问的，只有显式声明了对资源的 Allow 权限，才允许访问。
一票否决：即便是有 Policy 开启了 Allow, 一旦其他的 Policy 中出现对 Resource 的 Deny 声明，一律 Deny

Open Policy Agent

入门： Open Policy Agent:简化了微服务授权

OPA定义一套DSL语言rego,
上文中提到微服务架构下借助nginx实现权限控制的一种方式

OPA 入门系列

中文资料

IAM 实践 TODO

AWS IAM is supporting a role-based access control(RBAC), paradigm by defining permissions within policies and attaching those to applicable principlas(IAM users and roles)

besides supporting both identity- and resource- based policies, IAM has alway supported aspects of attribute-based access control(ABAC) via the optional condition policy element and “expressions in which you use condition operators(equal, less than, etc.) to match the condition in the policy against values in the request”, such as IP address or time of day.

furthermore it has supported authorization based on tags.

Implement TODO

~~怎么实现这个系统，以及各个系统怎么接入~~ 当我继续深入了解IAM，接触到一个全新的概念零安全架构，来源于 Google Enterprise Security BeyondCorp: a new approach to enterprise security, 同时同行也在实践这个方案，比如中通安全中通下一步的 IAM 架构设计, 接下来需要充分对这个架构的理解。

各个系统依赖于这个 IAM 权限控制系统，要实现的话两步走：

怎么实现这个系统
- 基于 hydra 来实现 OAuth2.0 和 OIDC
- 基于 keto，参考实现权限系统
各个系统怎么接入

所有服务以一种语言实现， IAM 的功能作为 SDK 的方式集成在服务之中，SDK 依赖弹性伸缩的数据服务，辅以上层负载均衡依照 AK 将请求路由到不同的分区，以期保证性能的同时，达到更高的灵活度。

hydra

拉取hydra代码，本地启动服务:

docker-compose -f quickstart.yml -f quickstart-postgres.yml up --build

TODO

细读 BeyondCorp: A new approuch to enterprise security
细读浅谈助力零信任安全架构的云 IAM 设计理解整个系统的架构
细读 Sequence diagrams of OAuth 2.0 in Authelete

参考

AWS IAM 从入门再到入门提到一些有用的工具

IAM 身份验证以及访问控制提到临时访问凭证的方案。

阿里云 RAM 策略整理对 Policy 详细的讲解。对于“RAM 角色身份的授权策策略检查逻辑” 最后两步不是很理解。为什么“检查 RAM 角色所属的主账号是否有授权”，以及检查“该资源是否支持跨账号 ACL 许可” ？

阿里云访问控制 pdf 文件产品简介，了解 RAM-User 和 RAM-Role 的定义。

Restful Api 的访问控制方式

当前国内外云计算平台的访问控制机制分析图挂了。

零信任的 5W1H 解释什么是零信任。回答了这个概念。

Amazon Cognito allow secure authentication in a world where mobile apps are regularly being accessed by individuals using multiple smart devices 资源怎么被 APP 使用。

贝壳找房权限服务的探索和实践贝壳找房权限服务的实践分享

AWS Identity and Access Management Gains Tags and Attribute-Based Access Control 2019.2.8 IAM 这一发布表示 Support the ability to embrace attribue-based access control (ABAC) and match aws resources with IAM principals dynamically to “simplify permissions management at scale”

Takahiko Kawasaki Co-founder and representative director of Authlete, Inc., working as a software engineer since 1997. 图做的非常清晰.

·End·

1.18 - APM资料整理

背景

NewRelic工作上使用、到Sentry接入所有应用，是时候需要对这些资料进行系统化的梳理。随着深入了解这些工具，APM出现在视线里，究竟什么是APM，它的来龙去脉是怎样的呢，它解决了什么问题（或者目的是什么）？这篇文章根据自己检索到资料，对这些问题进行阐述

名词解释

管理模型

APM

Application Performance Management的缩写, “应用性能管理”，由Gartner归纳抽象出的一个管理模型。
APM模型中一共分五个层次：

End User Experience

首先关注的是终端用户对应用性能的真实体验。目的是帮助管理者准备、详尽地了解真实用户体验是什么样子。

Runtime Application Architecture

应用架构映射，目的是解决企业应用架构黑盒或灰盒的现状。

应用的完整架构
单词请求的应用架构

Business Transactions

应用事务分析
GA大量的埋点，怎么做到不需要修改任何一行代码，我们并可以对应用取得的数据分析应用事务

确定上下文的事务操作，是同一个用户
确定所有事务操作的每一个步骤，是唯一一个动作

Deep Dive Component Monitoring

深度应用诊断，

在不修改用户代码的前提下，取得代码运行时性能数据
终端用户数据、运行时性能数据、数据指标数据、服务运行指标数据，有效关联
有太多的关注点，怎么方便的部署采集端
不影响原应用的性能。

Analytics / Reporting

处理数据要及时，必要时要做到实时的处理，问题可能随时都会发生；
数据的分析报告要准确，大量的数据本身无价值的，按照无业务模型进行分析、预测才能有其价值体现。

VS Sentry

“错误日志监控”也可称为“业务逻辑监控”，旨在对业务系统运行过程中产生的错误日志进行收集归纳和监控告警。
就是“APM应用性能监控”。但又与APM不同，APM系统主要注重应用层的行为分析，收集到更多是运营方向的数据。
而 Sentry所做是收集应用底层代码的奔溃信息，便于排查代码异常。简单来说，排障工具！
Sentry解决的问题：

无法第一时间感知错误
错误信息的获取相对低效
日志的处理方式不灵活
监控覆盖面有限

Sentry 是一个现代化的错误日志记录和聚合平台，支持所有的主流开发语言和平台，并提供了现代化UI。

参考

什么是真正的APM
运维开发实践——基于Sentry搭建错误日志监控系统

·End·

1.19 - API Trend 现状

GraphQL

2012年 facebook内部开发，2015年公开公布，2018年 GraphQL项目转移到新成立的GraphQL基金会。
有利有弊, 对于简单的API并不是好的选择。

避免服务器大量冗余数据的返回
不能有效利用查询结构的web缓存
带来的灵活性和丰富性的同时增加了复杂度

The GitHub GraphQL API 采用GraphQL API解决两大问题：

scalability, 一次性同时提供客户端需要的信息，而不是像RESTFul API那样重复的请求几次获取需要的数据
Collect some meta-information about our endpoint 给不同endpoint不同的OAuth权限范围，更灵活的分页，只获取用户需要的数据.

OPEN API(Swagger API)

OPEN API 从Swagger API 2.0发展而来
What’s the difference between Swagger and REST

the net result is that OAS(OPEN API Spec) is considered to be a standard specification for describing REST APIS. not just for developers to consume, it is also intended for usage by software .

是什么，一大特点

Versus older architectural styles, the specifics of the REST architectural style — their simplicity, their elegance, and their ability to rely on existing standard networking protocols like the one that makes the World Wide Web work (aka the “Hypertext Transfer Protocol” or “HTTP”) — have made it one of the more enduring and popular architectural styles for networkable APIs. This is one reason that REST APIs are sometimes also called “Web APIs.” Although it is not a requirement, most REST APIs rely on HTTP (the Web’s official protocol) to perform their magic.

这里解释REST架构API 为什么被称为 “Web API”

Restful API

RESTFul API设计指南
前端设备与后端进行通信，导致API架构的流行，甚至出现“API First”的思想。
RESTFul API是目前比较成熟的一套互联网应用程序的API设计理论。

协议
域名
版本
路径
HTTP动词
过滤信息
状态码
错误处理
返回结果
Hypermedia API
其他

Rest API backword compatibility

Tests
Always add parameters
Do not make optional parameters be mandatory
Always add additional HTTP response code returned by the API
Never delete or modify existing HTTP Response code behavior
Change URLs wisely

WSDL(Web Services Description Language )

网络服务描述语言

are complimentary to other older but still deeply entrenched networkable API architecture like “remote procedure call " or “RPC "

VS REST API

OAS is complimentary to the REST architectural style

参考

How to maintain Rest API backward compatibility? https://zubialevich.blogspot.com/2018/09/backward-compatibility.html

·End·

1.20 - API 兼容性设计

简介

向后兼容的一般目标是：服务升级到新的minor版本或者patch后客户端不应该被破坏

名词解释

Source Compatibility

code that compiled against version X of an API will also compile against version Y .

Binary Compatibility

code that compiled against version X of an API will run correctly in an environment that has version Y of the same API.

向后兼容性的改变

为API服务添加一个API接口
为API接口添加一个方法
为方法添加一个HTTP绑定
为请求消息添加一个字段
为响应消息添加一个字段
为枚举类型添加一个值
添加output-only的资源字段

不向后兼容的更改

删除或重命名一个服务，字段或者枚举值
更改HTTP绑定
更改某个字段类型
更改资源名称格式
修改已有请求的可见性
在HTTP定义中改变URL格式
在资源消息中添加读/写字段

参考

Google API 设计指南 - 兼容性
 Backward Compatibility Guidelines
API Gesign Guide Google针对网络API的通用设计指南
API Compatibility v2

·End·

1.21 - Web API Spec管理平台

背景

后台服务对外最重要的是提供API接口，上百上千的接口，怎么保持一种规范/约束，才能让接口在创建、更新、废弃生命周期中是有据可循的呢？

接口的创建是按照什么规范?
接口的更新是对业务方无影响或者影响非常小的？
接口的废弃是怎样的方式？

调研

业界解决方案

apiary

Support API Blueprint\Swagger API

DNA for your API — powerful, open sourced and developer-friendly. The ease of Markdown combined with the power of automated mock server , tests, validations, proxies, and code samples in your language bindings

Server Mock
Documentation
Traffic Inpspector

MuleSoft

the most widely used integration platform

总结：有接口管理、实时使用数据指标、流量突变告警、

APIMATIC

instant SDK, Code Samples, Test Cases
Continuous Code Generation
API Transformer

总结：其中代码生成、持续的代码生成、API转换都是非常需要的功能。

Apigee

google提供跨云环境的API管理

Web API Design
Mastering Full Lifecycle API
Management with Analytics
Securing APIs in the Age of Connected Experiences

总结：基本上满足文章开头的三个问题。

Kong

相结合的方案

drupal kong api publisher

https://git.drupalcode.org/project/kong_api_publisher/-/tree/1.0.x

kong slack api Governance

Luca Maraschi: API Butler for the Enterprise at Your Commands

How to set up Ops to seamlessly integrate with Slack
How to create an API based on a pre-defined template within Slack
How easy an API registration in Kong can be done via Slack
How to bring command, pipeline and services together seamlessly
Allow for transparent and analytic feedback loops within Slack
How to discover APIs in an easy and immediate way

Kong-Slack-API-Governance

Insomnia

类似于 postman，可以通过插件与 Kong Dev Portal 结合。

Kong Dev Portal & Workspace

在线编辑没有版本管理，管理了 Application 和 Service。

provides a single source of truth for all developers to locate, access, and consume services

https://konghq.com/blog/api-gateway-governance

其他

apigility
Falcon Python web framework
Amazon API Gateway traffic management, authorization and access control, monitoring and API version management
SPECCY a handy toolkit for OpenAPI, with a linter to enforce quality rules, documention rendering and resolution

公司解决方案

结论

一期

API生命周期管理：API新建和显示（For Humans, For Machines），API使用，API弃用，版本管理，授权与访问控制

TODO LIST

二期

API的健康状态管理： API流量、API错误率等。打通与APM之间的数据。

TODO LIST

三期

支持更多的功能： API在线编写， API不规范的提示，API调式，API MOCK，API自动测试，代码生成（SDK和sample）

Traffic Inspector
提供代理方式，开发者可以把数据发送到调试代理上，通过比对数据和协议内容，来定位问题。想法来自 apiary.io

TODO LIST

参考

The Web API Checklist – 43 Things To Think About When Designing, Testing, and Releasing your API
理解HTTP幂等性
 API EVANGELIST
API Blueprint
13 Free & Open Source Tools For API Creation, Management & Testing

·End·

1.22 - 迁移线上服务方案

背景

迁移服务器或者服务升级的事情时，需要对使用方来说是无感知的
那怎么做到新旧代码或者新旧服务的平稳过渡呢？根据不同的情况具体的方案也是不一样的。

调研

对各大公司进行做服务升级或者迁移的方案学习。吸取有用的方案。

模拟客户端的请求，前端做灰度上线

部署新旧服务，完成AB测试，灰度上线
数据库数据怎么解决的？

可行的方案

结合目前的我们的基础设施，可执行的方案。

最佳实践

最后，我们的最佳实践是什么，我们实践后的总结

参考

记一次从Rails直Golang的接口迁移

·End·

2 - Kubernetes

用于梳理 k8s 相关的笔记

2.1 - K8S Request vs Limit

不断有pod被重启，但是并不知道原因

名词解析

BestEffort: for a Pod to be given a Qos of BestEffort, the container in the Pod must not have any memory or CPU limits or requests.

Burstable: The Pod does not meet the criteria for QoS class Guaranteed, At least one Container in the Pod has a memory orr CPU request

Guaranteed: Every Container in the Pod must have CPU limit and a CPU request, and they must be same, Every Container in the Pod must have a memory limit and a memory request , and the must be the same.

要点

Request：容器使用的最小资源要求，不限制容器的最大可使用资源。

Limit：容器能使用资源的资源的最大值，设置为0表示使用资源无上限。

Memory目前只支持Request，Limit必须强制等于Request，这样确保容器不会因为内存的使用量超过了Request但没有超过Limit的情况下被意外的Kill掉

如果仅仅容器的内存使用量超过了Request但没有超过Limit的情况下，容器是不会被Kill掉的，所以上面这句话中”容器被意外的Kill掉“是其他原因导致的。比如出现内存资源抢占时。

Kubernetes 中资源的抢占

可压缩资源的抢占策略

— 按照Request的比值进行分配

不可压缩资源的抢占策略

按照优先级的不同，进行Pod的驱逐。

优先级？详细官网说明：Evicting end-user Pods

Request=Limit=0
0 < Request < Limit < Infinity
0 < Request==Limit

由于对不可压缩的资源，发生抢占的情况会出Pod被意外Kill掉的情况，所以建议对于不可以压缩的资源(Memory, Disk) 的设置为0<Requst==Limit

参考

Kubernetes 资源分配之Request和Limit解析 https://cloud.tencent.com/developer/article/1004976
Configure Quality of Service for Pods https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/
Kubernets 针对资源紧缺处理方式的配置（有点过时） https://www.kubernetes.org.cn/1150.html

·End·

2.2 - OAM 详解

名词解释

Actor Pattern (Actor 模型） 来自 Dapr 项目中的一个 concept. ~~Dapr 项目也实现了 OAM 规范~~, Dapr 为未来 APP 开发方式。

Dapr is a progamming model and not an application mdoel, so it operates in a lower abstraction. OAM is intended to solve the problem of modeling an application and all dependencies.

Dapr will not model the application’s architecture.

它是一个并行计算的数学模型，最初为由大量独立的微处理器组成的高并行计算机所开发。Actor 模型的理念非常简单：天下万物皆为 Actor。 Actor 之间通过发送消息来通信，消息的传递是异步的。每个 Actor 是完全独立的，同时执行它们的操作。

当一个 actor 接收到消息后，它能做如下三件事情中一件：

Create more actors
Send messages to other actors
Designates what to do with the next message

更多点击链接：10 分钟了解 Actor 模型，这本书也提到《七周七并发模型》，下载了电子版放在 MarginNote 3 里来细读。

Dapr 里实现了 Virtual Actor pattern^[2],

The Dapr actor runtime provides a simple turn-based Access model for accessing actor methods

背景

Application Developers – 编写程序实现业务价值，大部分不清楚 k8s 等其他基础设施。

Application Operators – 熟知集群的能力、稳定性、性能，以服务 Developers，帮助 developer 配置、部署、操作应用（eg, updating, scaling, recovery)。他们应该被称为 PaaS engineers。

Infra Operators – 服务 application operators, 应该被称为 IaaS 工程师。

OAM 的出现，解决了三者之间的合作问题 (Problems of Cooperation), 存在哪些常见的问题呢？

Interactions between Infra Operators and Application Operators

Application operator discover and manage capability that could potentially be in conflict with each other .

DisCovery the spec of new capability is difficult
Confirming the existence of specific capability in a particular cluster is difficult
Conflicts in capability could be troublesome
- Orthogonal
- Composable
- Conflicting

OAM’s Traits

In OAM, “Traits” are how we create capabilities with discorverability and manageability. 大部分的 traits 为 infra operator 来定义并实现的。

需要注意的是，Traits 并不等同于 K8s 插件，一个集群有多个网络相关的 Traits，比如“dynamic QoS trait”, “brandwidth control trait” 和 “traffic mirror trait”, 这些都是通过 CNI Plugin 来提供的。

Interactions between Application Operators and Application Developers

Developers' Voices Should be Heard

several parameters for an application,
Cannot be scaled
Is a batch job, not a long running service
Requires highest level security, etc

OAM’s Component

In OAM, we try to logically decouple K8s API objects, so developers can fill in their own intentions, and still be able to convey information to operatiors in structured manner.

The ApplicationConfiguration

ApplicationConfiguration 实例化应用，牵涉到 Component’s 名称和所应用到 Traits

合作流程的方式：

基础平台提供不同的 workload Type
开发者选择 workload type 定义 component.yaml
Application operator 执行 kubectl apply -f component.yaml 安装这个 component
Application operator 定义 ApplicationConfiguration 实例化这个应用。
最后，Application Operator 执行 kubectl apply -f app-config.yaml 触发整个应用的部署。

OAM

A specification for describing applications as well as its operational capabilities,

In OAM, an Application is made from three core concepts,

The first is the Components, which might a collection of microservices, a database and a cloud load balancer
the second concept is a collection of Traits which describe the operational characteristics of the application such as capabilities like auto-scaling and ingress.
Finally, to transform these descriptions into a concrete application, operators use a configuration file to assemble components with corresponding traits to form a specific instance of an application.

Traits

represents a piece of add-on functionality that attaches to a component instance, such as traffic routing rules(including load balancing policy, network ingress routing, circuit breaking, rate limiting) , auto-scaling policies, upgrade strageies, and more .

traits schematic comic

Component

Components

三部分组成：

Workload 描述：怎么运行这个组件 (component)
Component 描述：运行的是什么东西
一组可重写的参数。

Workload Type 内置类型

如何选择内置的 workload type，根据三个方面来决定（如下），当然也可以扩展类型。

Does this component expose service endpoint or not ?
Is this component relicable or not
Is this component long-runtime or one-time ( is daemonized or not )

workload type

Overwritable parameter

to operators, which part of my app definition is overridable

ApplicationConfiguration

Application Configuration 的一个简单例子如下：

apiVersion: core.oam.dev/v1alpha1
kind: ApplicationConfiguration
metadata:
  name: my-awesome-app
spec:
  components:
    - componentName: nginx
      instanceName: web-front-end
      parameterValues:
        - name: connections
          value: 4096
      traits:
        - name: auto-scaler
          properties:
            minimum: 3
            maximum: 10
        - name: security-policy
          properties:
            allowPrivilegeEscalation: false

其中

parameterValues – Operator 使用这个字段来复写connections的值为 4096
Trait auto-scaler – operator apply autoscaler trait (e.g. HPA) to the component.
Trait security-policy – operator 应用安全策略规则到 component

Note: operator 可以增加更多的 traits 到列表，比如可以增加’Canary Deployment Trait' 使应用更新符合 canary rollout strategy.

Scopes

Application scopes are used to logically group components together by providing application boundaries that represent common group behaviors and/or dependencies.

why scopes

rudr scopes

Rudr 实践

缺国外云上的 k8s 环境，用于实践。根据 rudr doc 实践

helm 的使用
rudr 目前的情况
developer: 如何 Writing a Trait https://github.com/oam-dev/rudr/blob/master/docs/developer/writing_a_trait.md
developer: how to implement your own workload type: support openfaas && prometheums as first extended workload. https://github.com/oam-dev/rudr/pull/481

安装 minikube (deprecated)

minikube start 下载 minikube.iso 文件速度太慢了，5kb/s

minikube-v1.6.0.iso: 19.92 MiB / 150.93 MiB 13.20% 9.44 KiB p/s ETA 3h56

放弃了。使用 vultr 上 2c4m 机器来跑

在 vultr 上安装 microk8s

之前没有使用过 microk8s。也可以使用远程使用公司的电脑。

创建 ubuntu 18.04 系统，使用 microk8s，使用 helm3 安装 rudr

参考 Installing rudr。

helm3 Github Homepage: https://github.com/helm/helm Helm is a tool for managing Charts. Charts are packages of pre-configured kubernetes resources. Think of it like apt/yum/homebrew for Kubernetes.

Charts can be stored on disk, or fetched from remote chart repositories(like Debian or RedHat packages)

microk8s Homepage: https://github.com/ubuntu/microk8s The smallest, fastest Kubernetes. Single-package fully conformant lightweight kubernetes Perfect for:

Developer workstations
IoT
Edge
CI/CD

snap install microk8s --classic

microk8s.enable helm3

git clone https://github.com/oam-dev/rudr.git

microk8s.helm3 install rudr ./rudr/charts/rudr --wait

Installing Implementations for Traits

Ingress，Manual Scaler, autoscaler 比如 Ingress，那么从 Helm3 hub 中选择 Ingress 的实现即可。同样， autoscaler 选择一种实现并部署即可。

Deploy a sample Rudr application using the tutorial

教程： Tutorial: Deploy, inspect, and update a Rudr application and its components

比较简单的完成。心得：如果 Traits 比较丰富并且能轻松获取到使用文档的话，对于 Application Operator 来说，是非常方便的对 Application 的加上增强或者限制的功能。

OAM 的未来

working with the community on OAM spec as well as K8s implementation

参考

[1] Phil Bernstein Sergey Bykov Alan Geller Gabriel Kliot Jorgen Thelin: Orleans: Distributed Virtual Actors for Programmability and Scalability

[2] Dapr: Actors overview: Overview of the actors building block

[3] The Open Application Model from Alibaba’s Perspective https://www.infoq.com/articles/oam-alibaba/

[4] 开放应用模型 OAM https://www.jianshu.com/p/da9bf3357247

[5] A Kubernetes implementation of the Open Application Model specification https://github.com/oam-dev/rudr

[6] Automating Event-Based Continuous Delivery on Kubernetes with keptn, The keptn project provides a clear separation of concerns, allowing developer, DevOps and site reliability engineers to update delivery pipelines, 这个更加细，包括 SLO 指标、包括 blue\green deploy

·End·

2.3 - Kubernates Cluster Architecture

分析kubernetes的架构

Cloud Controller Mananger

解耦 Kubernetes 和底层云原生基础的交互。让两者的更新以不同的速度进行。

Cloud-controller-manager 使用插件机制，让不同的 cloud providers 集成 Kubernetes 到他们的平台。

Cloud controller manager runs in the control plane as a relicated set of processes(usuallyh, these are containers in Pods). Each cloud-controller-manager implements multiple controllers in a single process.

The controllers inside the cloud controller manager include:

Node Controller
Route Controller
Service Controller

cloud-provider-alibaba-cloud 使用阿里云负载均衡来访问服务。链路更短，

原链路： SLB –> Node –> Pod

新链路： SLB –> Pod

参考资料

IBM 微讲堂：Kubernetes 系列

3 - Golang

用于梳理 golang 相关的笔记

3.1 - About Golang String

问题

package main

import (
	"fmt"
)
func main() {
	var bys  [20]byte

	bys[0] =  'h'
	bys[1] =  'e'
	bys[2] =  'i'

	if string(bys[:]) == "hei" {
		fmt.Println("[20]byte('h','e','i') === \"hei\"")
	}
}

输出： [20]byte(‘h’,‘e’,‘i’) === “hei”

要点

the internal structure of any string type is declared like:

type _string struct {
   elements *byte // underlying bytes
   len      int   // number of bytes
}

String types are all comparable, When comparing two strings, their underlying bytes will be compared, one byte by one byte.
if one string is a prefix of the other one and the other one is longer, then the other one will viewed as the large one.
when a byte slice is converted to a string, the underlying byte sequence of the result string is also just a deep copy of the byte slice.

延伸

package main

import (
	"fmt"
)

const s = "Go101.org"

// len(s) is a constant expression, 
// whereas len(s[:]) is not.
var a byte = 1 << len(s) / 128 
var b byte = 1 << len(s[:]) / 128

func main() {
	fmt.Println(a, b) // 4 0	
}

为什么会出现不一样的结果，一个要点是： the special type deduction rule in bitwise shift operator operation

当位运算左元素为 untyped value，并且右元素为 constant 时，运算的结果类型保持与左元素一样。
当位运算左元素为 untyped value，并且右元素为 non-constant 时，首先左元素类型转化为 assumed type (it would assume if the bitwise shift operator operation were replaced by its left operand alone)

根据上面两个 rule, 上面语句变成：

var a = byte(int(1) << len(s) / 128)
var b = byte(1) << len(s[:]) / 128

为什么会有这样的 rule ？

avoid the cases that some bitwise shift operations retuan different results on different architectures but the differences will not be detected in time.

举个例子

var m = uint(32)
// The following three lines are equivalent to each other.
var x int64 = 1 << m 
var y = int64(1 << m)
var z = int64(1) << m

·End·

3.2 - Monorepo for Golang With Bazel

背景

ProtocolBuffer + gRPC + Bazel + Monorepo + Microservice is the common poly glot pattern which is also used in Google and other tech companies

微服务面临的问题是大量的 repo（成百上千），尤其在依赖库的管理、规范的落地，非常麻烦困难。

需求点

Avoid depending on the host system
- 使用相同的 Go distribition
Try to stay reproducible and deterministic
- 使用同一的 go.mod 来管理依赖
在不同的应用里可以服用共同的代码块
可以执行单个应用的测试也可以执行所有应用的测试
可以构建单个应用或者所有的应用
构建时间短，可以复用已经构建好的 artifacts

新概念

Bazel

Bazel is an open-source build and test tool similar to Make, Maven, and Gradle. It uses a human-readable, high-level build language. Bazel supports projects in multiple languages and builds outputs for multiple platforms. Bazel supports large codebases across multiple repositories, and large numbers of users.

Rule

A rule defines a series of actions that Bazel performs on inputs to produce a set of outputs.

rule_go 为 golang 编写的 rules。

对 Rule 进一步的了解，学习如何编写 rule，资料入口：rule_go 的作者系列文档： A simple set of Go rules for Bazel, used for learning and experimentation.

repository rule

解决依赖于安装在本机上的 toolchain，这样如果不同的开发者需要在本地上安装 toolchain。一旦 toolchain 不一样的时候，构建出来的结果不一样。于是，通过 repository rule 来下载 go toolchain 和生成 custom build file.

a repository rule, a special function that can be used in a WORKSPACE file to define an external WORKSPACE.

Gazelle

Gazelle is a build file generator for Bazel projects. It can create new BUILD.bazel files for a project that follows language conventions, and it can update existing build files to include new sources, dependencies, and options. Gazelle natively supports Go and protobuf,

实践

《Create Go Monorepo with Go-modules and Bazel》

Example: https://github.com/PxyUp/go_monorepo

给 go-present 仓库加上 bazel 构建 monorepo 仓库管理方式

Gazelle 使用

bazel run //:gazelle

疑惑点：

每个 module 需要写 BUILD.bazel 配置文件，带来额外的麻烦。
- Gazelle build file generator，原生支持 Go / protobuf
如何跟 reviewdog(golangci-linter) 结合
~~不支持 IDE，给习惯用 IDE 的来说是不愿意的接受~~
如何处理 grpc 生成的代码
- 共享生成的仓库

结论

有利于代码共享，解决各自孤立的状态
有利于统一依赖库，统一升级，确保安全。
新的技术，激活技术氛围和兴趣

FAQ

Q: how do you set up a CI/CD pipeline for a mono repo? When a code change to the repository triggers CI

A: http://blog.shippable.com/ci/cd-of-microservices-using-mono-repos，更多内容参考： https://github.com/korfuri/awesome-monorepo/blob/master/README.md

参考

Bazel 结合 golangci-linter

Guide: Create monorepo with Go Modules and Bazel

Monorepo at Uber

cmake + Conan：decentralized and multi-platform package manager to create and share all your native binaries.

Monorepo with bazel and go module：践行者

·End·

3.3 - Golang Pprof 使用梳理

背景

当前碰到的问题：

部分服务 CPU 负载比较高
服务内存使用量比较大
服务高延迟（内存 CPU 负载都不高情况）

golang 开发中有一些定位这些问题的套路和工具，在本文中汇总，记录并不断改进解决问题的思路。

从两个方面考虑：

系统监控统计级别数据指标
- goroutine 数量
- 堆 (heap) 内存使用量
- 栈 (stack) 内存使用量
- 其他…（待补充）
问题定位所需的详细数据
- 获取系统实时堆内存分配详细信息：具体到这个内存在哪里分配的。
- 获取系统实时所有 goroutine 调用堆栈信息：具体到这个 goroutine 是在哪里启动的，以及当前在干什么
- 获取系统实时堆内存调优辅助统计信息：具体是在哪里分配了多少内存，以及 TOP N 分别是哪些，甚至是每个内存分配的来源图

Diagnostics

获取系统实时堆内存分配详情

// 引入 pprof
import "net/http/pprof"
// 在 http router 上加入
this.debugMux.HandleFunc("/debug/pprof/", http.HandlerFunc(pprof.Index))

curl -XGET "http://192.168.149.150:8080/debug/pprof/heap?debug=2" 获取 heap 内存的详细信息，其中 8080 是你开启的 http server 的端口，debug=2 意味着需要输出详细信息

获取系统实时所有 goroutine 调用栈信息

通过curl -XGET "http://192.168.149.150:8080/debug/pprof/goroutine?debug=2"拿到的就是 goroutine 的详细信息

获取系统实时堆内存调优辅助统计信息

go tool pprof -inuse_space http://192.168.149.150:8080/debug/pprof/heap，进入 pprof 交互模式后，可以通过 top, tree 等进一步查看统计信息，同时，也可以通过 png 命令，将内存信息输出成图片，以图片的形式显示内存的分配、占用情况

获取 trace 数据

通过：curl -XGET "http://127.0.0.1:8080/debug/pprof/trace?seconds=30" -o 002_trace_2017_09_08.out我们将获取一个 30 秒的 trace 数据 (trace_02.out)，通过go tool trace 002_trace_2017_09_08.out

也是各种坑，比如页面打开空白： gotip tool trace xxx.out

Profile

侧重于统计程序各 goroutine 自身的运行状况，更加适用于分析针对 cpu 密集型逻辑导致的 latency 过高问题

cpu

heap

pprof 的 top 会列出 5 个统计数据：

flat: 本函数占用的内存量
flat%: 本函数内存占使用中内存总量的百分比
sum%: 前面每一行 flat 百分比的和
cum：是累计量，假如 main 函数调用了函数 f, 函数 f 占用的内存量，也会记进来
cum%: 是累计量占总量的百分比

Memory profiling records the stack trace when a heap allocation is made

Stack allocations are assumed to be free and are not tracked in the memory profile.

Memory profiling, like CPU profiling is sample based, by default memory profiling samples 1 in every 1000 allocations this rate can be changed.

Because of memory profiling is samples based and because it tracks allocation not use , using memory profiling to determine your Application’s overall memory usage is difficult .

Heap “不能” 定位内存泄漏

该 goroutine 只调用了少数几次，但是消耗大量的内存
该 goroutine 调用次数非常多，虽然协程调用过程中消耗的内存不多，但该调用路径上，协程数量巨大，造成大量的内存消耗，并且这些 goroutine 由于某种原因无法退出，占用的内存不会释放。

第二种情况，就是goroutine 泄漏，这是通过 heap 无法发现的，所以 heap 在定位内存泄漏这件事情上，发挥作用不大。

goroutine 泄漏怎么导致内存泄漏

每个 goroutine 占用 2kb 内存
goroutine 执行过程中存在一些变量，如果这些变量指向堆中的内存，GC 会认为这些内存仍在使用，不会对其进行回收，这些内存无法使用，造成内存泄漏 a. goroutine 本身的栈占用的空间 b. goroutine 中的变量所占用的堆内存，这一部分是能通过 heap profile 体现出来的。

如何定位 goroutine 内存泄漏

pprof 查看当前 heap 里谁（哪一段代码分配）占用内存比较大， so 正确的做法是导出两个时间点的 heap profile 信息文件，使用 –base 参数进行对比

来自 Hi, 使用多年的 go pprof 检查内存泄漏的方法居然是错的？!

实战 Go 内存泄漏通过监控工具和 go pprof 的 diff 方式来定位内存泄漏的问题，非常详细了，定位 goroutine 泄漏的方式

查看某条调用路径上，当前阻塞在此 goroutine 的数量
- go tool pprof http://ip:port/debug/pprof/goroutine?debug=1
查看所有 goroutine 的运行栈，可以显示阻塞在此的时间
- go tool pprof http://ip:port/debug/pprof/goroutine?debug=2

goroutine 究竟占了多少内存？, 先来看看结论

goroutine 所占用的内存，均在栈中进行管理
goroutine 所占用的栈空间的大小，由 runtime 按需进行分配
以 64 位环境的 JVM 为例，会默认固定为每个线程分配 1M 栈空间，如果大小分配不当，会出现栈溢出的问题

我是如何在大型代码库上使用pprof调查 Go 中的内存泄漏

pprof的工作方式是使用画像。？？？画像是一组显示导致特定事件实例的调用顺序堆栈的追踪，例如内存分配.
如果内存消费是已个相关的考虑因素的话，当数据不稀疏或者可以转换为顺序索引时，使用amp[int]T也没问题，但是通常应该使用切片实现。
- 扩容一个切片时，切片可能会使操作变慢，在map中这种变慢可以忽略不计。

Go 内存原理

然后来了解内存中的几个概念

分段栈

早起版本中，Go 给 goroutine 分配固定的 8kb 的内存区域，当 8kb 空间不够了怎么办？

GO 会在每个函数入口处插入一小段前置代码，它能够检查栈空间十分被消耗殆尽，如果用完了，会调用 morestack() 函数来扩展空间。

带来的问题：熟知的 hot split problem （热点分裂问题）

连续栈

从 Go1.4 之后，正式使用连续机制，二倍大小空间进行复制

mem.Sys: Sys measures the virtual address space reserved by Go runtime for the heap, stacks, and other internal data structures.

mem.Alloc : 已经被分配并仍在使用的字节数, the same as mem.HeapAlloc

mem.TotalAlloc: 从开始运行到现在分配的内存总数

mem.HeapAlloc: 堆当前的用量, 具体如下两个

all reachable objects
unreachable objects that the garbage collector has not yet freed

mem.HeapSys: 包含堆当前和已经被释放但尚未归还操作系统的用量, 以及预留的空间.

mem.HeapIdle:

mem.HeapReleased:

以上详细解释参考 runtime#Memstats

pprofplus 将内存绘制成曲线图来查看内存的变化

golang 手动管理内存 #TODO 为什么加这个链接?

内存使用分析方法

理解 go 语言的内存使用中三种方式

通过 runtime 包的 ReadMemStats 函数
- Memory Usage when reading large file 提问者读一定大小的文件到内存，而实际上分配的内存远高于文件大小。
通过 pprof 包, pprof 仅仅是获取了样本，而不是真正的值，是非常重要的？
- tools/techniques for tracking down “too many open files” : the memory profile shows where the things ware created, not where they ‘live’.
通过 gc-trace 调式环境变量
cgo 或者syscall 内存泄漏，怎么办？
- Also CGO / syscall (eg: malloc / mmap) memory is not tracked by go. How to analyze golang memory
- Go语言使用cgo时的内存管理笔记如何定位cgo内存泄漏 #TODO
- Golang cgo memory

Go 语言设计与实现详细从源码分析内存分配原理 #TODO

linux 内存结构

VIRT: 亦虚拟内存，虚拟地址空间大小，是程序映射并可以访问的内存数量, 参考下图对虚拟内存的解释,

RES: 亦常驻内存，进程虚拟空间中已经映射到物理内存的那部分的大小。

SHR: 亦共享内存，进程占用的共享内存大小，比如程序会依赖于很多外部的动态库(.so)。

理解 virt res shr 之间的关系 - linux

mem: 物理内存

swap: 虚拟内存，即可以把数据存在在硬盘的数据

shared：共享内存 , 存在物理内存中

buffers: 用于存放要输出到 disk 的数据的

cached: 存放从 disk 上读取的数据

点击查看图片，来自内存与 I/O 的交换, 详细讲解了 file-backed pages vs anonymous pages.

名称	说明
total_mem	物理内存总量
used_mem	已使用的物理内存量
free_mem	空闲的物理内存量
shared_mem	共享内存量
buffer	buffer 所占的内存量，翻译为缓冲区缓存
cache	cache 所占内存量，翻译为页面缓存
real_used	实际使用的内存量
real_free	实际空闲的内存量
total_swap	swap 总量
used_swap	已使用的 swap
free-swap	空闲的 swap

real_used = used_mem - buffer - cache
real_free = free_mem + buffer + cache
total_mem = used_mem + free_mem

Docker 容器内存监控

Linux cgroup - memory 子系统讲解, 非常全面的介绍的 cgroup 里的内存概念

这里面涉及到多个内存相关概念：

tmpfs
- tmpfs 详解
- 临时文件系统，驻留在内存中
- tmpfs 大小：只有真正在 tmpfs 存储数据了，才会去占用。
page cache
- page: The virtual memory is divided in pages .
- Page cache 主要用来作为文件系统上的文件数据的缓存来用，尤其是针对当进程对文件有 read/write 操作的时候。什么是 page cache
rss, 内存耗用：VSS/RSS/PSS/USS 的介绍
- anonymous and swap cache, not including tmpfs (shmem), in bytes
anonymous cache
- 先了解匿名映射：进程使用 malloc 申请内存，或使用 mmap(MAP_ANONYMOUS 的方式）申请的内存
- 再了解文件映射：进行使用 mmap 映射文件系统的文件，包括普通文件，也包括临时文件系统 (tmpfs), 另外 Sys v 的 IPC 和 POSIX 的 IPC 也是。
swap cache
- Swap 机制：当内存不够的时候，我们可以选择性的将一块磁盘、分区或者一个文件当成交换空间，将内存上一些临时用不到的数据放到交换空间上，以释放内存资源给急用的进程。
- Inactive（anon 匿名映射）, 这部分内存能被交换出去的。需要注意的是，内核也将共享内存作为计数统计进了 Inactive（anon）中去了（是的，共享内存也可以被 Swap）。

active_file + inactive_file = cache - size of tmpfs

active_anon + inactive_anon = anonymous memory + file cache for tmpfs + swap cache

Memory - Part 1: Memory Types

按照两个维度来划分内存

whether memory is private ( specific to that process ) or shared
- private
- shared
whether the memory is file-backed or not (in which case it is said the be anonymous )
- anonymous: purely in RAM
- file-backed: When a memory map is file-backed, the data is loaded from disk

见下表：

	PRIVATE	SHARED
ANONYMOUS	stack, malloc(), mmap(ANON, PRIVATE), brk()/sbrk()	mmap(ANON, SHARED)
FILE-BACKED	nmap(fd, PRIVATE) binary/shared libraries	mmapn(fd, SHARED)

采坑记 - go 服务内存暴涨 , 对 MADV_FREE 结合页表来分析，更加详细

内存分配

在 Linux 下，malloc 需要在其管理的内存不够用时，调用 brk 或 mmap 系统调用 ( syscall ) 找内核扩充其可用地址空间
OS 用页表来管理进程的地址空间，其中记录了页的状态、对应的物理页地址等信息，一页通常是 4kb
当进程读 / 写尚未分配的页面时，会出发一个缺页中断 ( page fault ), 这时内核才会分配页面，在页表中标记为已分配，然后再恢复进程的执行。

内存回收

当 free 觉得有必要的时候，会调用 sbrk 或 munmap 缩小地址空间，这是针对一整段地址空间都空出来的情况
但更多的时候只释放其中一部分内容（比如连续的 ABCDE 五个页面中只释放 C 和 D），并不需要（也不能）把地址空间缩小
free 可以通过 madvise 告诉内存”这一段我不用了"

madvise

通过 madvise(addr, length, advise) 这个系统调用，告诉内核可以如何处理从 addr 开始的 length 字节。
在 Linux Kernel 4.5 之前，只支持 MADV_DONTNEED，内核会在进程的页表中将页标记为"未分配”，从而进程的 RSS 就会变小。

go 1.12 的改进

从 kernel 4.5 开始，Linux 支持了 MADV_FREE.

threadcreate

goroutine

block

mutex

实践 web 方式 Mutex profle

~~里面提到的 PPT 在本地分析不出数据，~~, 因为没有用 goroutine

for _, f := range factors(n) {
  mu.Lock()
  m[f]++
  mu.Unlock()
}

mu.Lock()
for _, f := range factors(n) {
  m[f]++
}
mu.Unlock()

Trace

Synchronization blocking profile

来自 rhys Hiltner 分析。

the thing that we are spending here is seconds that we’re spent waiting. we have kind of the goroutine name at the top of the stack.

关于方框中"of"前面的的 0, 表示 “zero time was spent inside of the box of 4.43. 来自 Profiling and Optimizing Go, 关于 Type:CPU 的图解，时间： 11:00

goroutines

goroutines that were running in that propram during those few seconds that i was recoording and listed.

根据 Execution time\Network wait time\Sync block time\Blocking syscall time\Sechedule wait time 的情况后，可以通过 graph 图了解 goroutine 详细情况。参考 [10]

Flame

pprof -http "localhost:12345" 'http://127.0.0.1:53668/block?id=19105152&raw=1'

查看 goroutine 中执行时间。

在参考 [7] 的视频 11:43 开始实际操作使用 Flame 定位程序执行慢的问题。

Debugging

Runtime statistics and events

实践

定位高延迟的服务。

使用 logrus 打印日志文件，其中 Logrus 使用全局锁导致，goroutine 之间竞争写锁。


func (entry *Entry) write() {
	entry.Logger.mu.Lock()
	defer entry.Logger.mu.Unlock()
	serialized, err := entry.Logger.Formatter.Format(entry)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Failed to obtain reader, %v\n", err)
	} else {
		_, err = entry.Logger.Out.Write(serialized)
		if err != nil {
			fmt.Fprintf(os.Stderr, "Failed to write to log, %v\n", err)
		}
	}
}

在这篇 Is there a golang logging library around that doesn’t lock the calling goroutine for logging? 链接里也提到 logrus 写锁怎么处理

协程异步写日志，但是会占内存
换 zap 库

其实并没有解答为什么延迟非常高的问题。

TODO

docker cgroup 技术之 memory 看起来挺详细的分析文档，待细看。

Go 内存泄漏？不是那么简单

图解 Go 语言内存分配

Go语言使用cgo时的内存管理笔记, 简单教你如何定位cgo导致的内存泄漏

cgo内存分析进阶版, 三部曲，英文.

Rakyll 一系列的调优

参考

go tool proof 郝琳的中文说明 #TODO https://github.com/hyper0x/go_command_tutorial/blob/master/0.12.md
Profiling Go Programs 官方 Blog #TODO https://blog.golang.org/pprof
一次 Golang 程序内存泄漏分析之旅 http://lday.me/2017/09/02/0012_a_memory_leak_detection_procedure/ 链接访问有点问题，可以用 google cache 查看文字
一次 Golang 程序延迟过大问题的定位过程 http://lday.me/2017/09/13/0013_a_latency_identification_procedure/
go tool trace https://making.pusher.com/go-tool-trace/ #TODO
Go 程序的性能监控与分析 pprof https://www.cnblogs.com/sunsky303/p/11058808.html
Rhys Hiltner - An Introduction to “go tool trace” https://www.youtube.com/watch?v=V74JnrGTwKA
关于 Go 程序调式、分析和优化来自 Brad Fitzpatrick 的分享 #TODO https://studygolang.com/articles/4716
Golang remote profiling and flamegraphs, 对各种图阐述的比较清晰 https://matoski.com/article/golang-profiling-flamegraphs/
Using Go 1.10 new trace features to debug an integration test, 这边文章描述了 trace 使用过程，赞 https://medium.com/@cep21/using-go-1-10-new-trace-features-to-debug-an-integration-test-1dc39e4e812d
Profiling Go programs with pprof #TODO https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/
rakyll blog #TODO https://rakyll.org/archive/
Diagnostics 官文 #TODO https://golang.org/doc/diagnostics.html
实战 Go 内存泄漏 https://segmentfault.com/a/1190000019222661

·End·

3.4 - Gorm

TIME_WATI 数量达到 300 多个，连接中的数量只有几个。

设置

设置下最大连接数（100）和闲置连接数（50）以及~~连接的时间（1 小时）~~

从 [2] 文章测评来看，unlimited ConnMaxLifetime 的情况下，时间和内存分配上都会少。该 ConnMaxLifetime 使用场景如：

你的 SQL 数据库实现了最大的连接生存期，或你希望在负载均衡器后面方便的切换数据库（这句话不是很懂）

SetMaxIdleConns 设置

维护一个大的空闲连接池，副作用是有的，占用内存。另外一种可能是，如果一个连接空闲太久，那么它也可能变得不可用。例如 Mysql 的 wait_timeout 设置将自动关闭 8 小时内未使用的任何连接（默认情况下），当发生这种情况时，sql.DB 会优雅的处理它。在放弃之前，将自动重试两次连接，之后 Go 将从池中删除坏连接并创建新连接。

超出连接限制

如果数据连接限制为 5 个，一旦达到 5 个连接的硬限制，pg 数据库驱动程序立即返回一条 sorry, too many clients already 错误信息，而不是完成插入操作。

参考

分析 golang sql 连接池大量的 time wait 问题 http://xiaorui.cc/archives/5771
配置 sql.DB 获得更好的性能 https://colobu.com/2019/05/27/configuring-sql-DB-for-better-performance/

·End·

3.5 - Golang语言之禅

理解

Each package fulfils a single purpose 保持每个package的单一目的性

A well designed Go package provides a single idea, a set of related behaviours. A good Go package starts by choosing a good name. Think of your package’s name as an elevator pitch to describe what it provides, using just one word.

Handle errors explicitly 显式处理errors

Robust programs are composed from pieces that handle the failure cases before they pat themselves on the back. The verbosity of if err != nil { return err } is outweighed by the value of deliberately handling each failure condition at the point at which they occur. Panic and recover are not exceptions, they aren’t intended to be used that way.

Return early rather than nesting deeply 提前返回胜过深嵌套

Every time you indent you add another precondition to the programmer’s stack consuming one of the 7 ±2 slots in their short term memory. Avoid control flow that requires deep indentation. Rather than nesting deeply, keep the success path to the left using guard clauses.

Leave concurrency to the caller 把并发交给调用者

Let the caller choose if they want to run your library or function asynchronously, don’t force it on them. If your library uses concurrency it should do so transparently.

Before you launch a goroutine, know when it will stop 使用goroutine之前想清楚它什么时候结束

Goroutines own resources; locks, variables, memory, etc. The sure fire way to free those resources is to stop the owning goroutine.

Avoid package level state

Seek to be explicit, reduce coupling, and spooky action at a distance by providing the dependencies a type needs as fields on that type rather than using package variables.

Simplicity matters

Simplicity is not a synonym for unsophisticated. Simple doesn’t mean crude, it means readable and maintainable. When it is possible to choose, defer to the simpler solution.

Write tests to lock in the behaviour of your package’s API 编写单元测试，保证Package API行为

Test first or test later, if you shoot for 100% test coverage or are happy with less, regardless your package’s API is your contract with its users. Tests are the guarantees that those contracts are written in. Make sure you test for the behaviour that users can observe and rely on.

If you think it’s slow, first prove it with a benchmark

So many crimes against maintainability are committed in the name of performance. Optimisation tears down abstractions, exposes internals, and couples tightly. If you’re choosing to shoulder that cost, ensure it is done for good reason.

Moderation is a virtue

Use goroutines, channels, locks, interfaces, embedding, in moderation.

thress attribute of Channel

Gurarantee Of Delivery
- Ticker的实现为什么使用Delayed Guarantee类型的Channel
- Ticker为什么会丢失
State
With or Without Data

Maintainability counts 可维护性

Clarity, readability, simplicity, are all aspects of maintainability. Can the thing you worked hard to build be maintained after you’re gone? What can you do today to make it easier for those that come after you?

翻译原文： https://github.com/davecheney/the-zen-of-go

参考

The Behavior Of Channels #TODO https://www.ardanlabs.com/blog/2017/10/the-behavior-of-channels.html
Go proverbs https://www.kancloud.cn/cserli/golang/524388

·End·

3.6 - Code Review 开始

简介

整个golang团队20多人，没有code review ，对项目质量、对结果产出、对新人的成长，对团队交流的氛围影响大。看过Google 代码评审规范，解决了我之前一些疑问和也让我坚定的去Code Review。
当没有code review时候，要求重构，而重构价值是释放历史包袱，并没有产生任何其他价值

我们的提交是这样的

3b8e45c - Slove Confilct -  2 weeks ago -
0a39ecd - FIXS: vendor -  2 weeks ago -
7817d14 - debug -  2 weeks ago -
67539e2 - debug -  2 weeks ago -
9044356 - Slove Confilct -  2 weeks ago -
d47db91 - FIXS: ss -  2 weeks ago -
8913c30 - Slove Confilct -  2 weeks ago -
2d407d2 - FIXS: logger -  2 weeks ago -

b9af055 - 打印日志 -  7 weeks ago - 
1124e92 - 打印日志 -  7 weeks ago - 
88d0eac - 修改log -  7 weeks ago - 
ad0b3dd - 修改日志 -  7 weeks ago - 
4aa0740 - 答应日志 -  7 weeks ago - 
824658a - 修改日志 -  7 weeks ago - 
178c30c - 打印日志 -  7 weeks ago -

在pull request的时候，认真review下所有的commit，该合并得合并，该修改得修改

我们的命名是这样的
这里不截图纪念了.
我们的代码分支和发版是这样的
本地打包,更恶心的是代码不提交本地打包的.
我们的单元测试是这样的
几乎没有

我们开始要做Code Review，从哪里开始了？

方式

谁对谁在什么时候用什么方式去做什么？

第一个“谁”

代码评审员

如果项目存在两人或者两人以上开发

如果开发提交代码，则应用项目负责人
如果应用负责人也参与开发，则由另外任一一位开发做一次review，然后上一级的负责人做第二次review。

如果应用负责人和开发是同一个人，这时候为“小组Leader”

自动Lint工具

借助自动化完成代码最基本的审核，比如reviewdog & golangci-lint，更多相关知识Github Action-golangci-lint

第二个“谁”

业务开发人员对应用提交的pull request

什么时候

提交Review时的当天或者第二天须完成

什么方式

依照代码审核规范，目前缺少自己的审核规范，类似规范参考

代码审查规范
- Google 代码评审规范
- 谷歌工程实践 by jimmysong
代码规范[golang]
- How Thanos Would Program in Go
  - 参考runutil包解决defer中error的检查问题, 相比写匿名函数更加的优雅
  - 包 pkg/errors 比标准的fmt.Errorf + %w更可读
  - 待补充

做什么

阅读提交的代码并给出建议完成审核

落地

reviewdog & golangci-lint在gitlab上配置实践

熟悉github action方式，借鉴其优点；在一个项目中实践，然后推广到其他项目中。

如何做到所有项目不需要自行配置或者简单的配置（比如增加一个配置现成的文件），并且使用同一个套代码检查标准？
- ~~制作包含reviewdog.yml的配置文件，如果做linter升级的话，更新tag为latest最新的镜像即可.~~, 已经完成了
- ~~在一半的项目上增加reviewdog~~
- 目前linter设置为golint和errcheck方式, 下一步增加golangci-lint检查代码
目前没有非常成熟的方案，需要花费一些时间去解决现有开源方案中的问题。
- reviewdog 结合 golangci-lint 使用，修改其输出格式, more link 在presto-pay是使用golangci-lint,但是reviewdog在官网上没有golangci-lint的案例

失败

golangci-lint自身大而全的能力，导致其功能本身不稳定，不如golint或errcheck那么纯粹

reviewdog & golint/errcheck/govet/… 在 gitlab 上配置实践


reviewdog:
  stage: review
  # 自定义镜像, 包含统一的reviewdog配置文件和需要安装的reviewdog/golangci-lint版本
  image: golang:custom-latest
  before_script:
    - curl -sfL https://raw.githubusercontent.com/reviewdog/reviewdog/master/install.sh| sh -s -- -b $(go env GOPATH)/bin v0.10.0
    - curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.27.0
    - export GITLAB_API="https://examplegitlab.com/api/v4"  
  script:
    - reviewdog -conf=/etc/reviewdog/reviewdog.yml  -reporter=gitlab-mr-discussion  -guess -fail-on-error=true
  only:
    - merge_requests

reviewdog.yml 配置如下

runner:
  golangci:
    cmd: golangci-lint run --config=/etc/reviewdog/golangci/golangci.yml ./...
    errorformat:
      - '%E%f:%l:%c: %m'
      - '%E%f:%l: %m'
      - '%C%.%#'
    level: warning

reviewdog 结合各种错误检查，详细见: reviewdog.yml
使用预设的errformat, 例如通过参数-f=golangci-lint，更多的预设errformat使用 reviewdog -list 查看，点击链接 go.go
在gitlab里配置参考gitlab上的工程：reviewdog test
exit code的处理
- errcheck 命令在检查到 err 时，exit code为0 （通过echo $?查看, 更多查看Chapter 6. Exit and Exit Status）
- reviewdog默认的 exit code 为0，当加上 -fail-on-error=true时候则会返回1（当检查到不规范的时候）
- errcheck | reviewdog 根据现象是当errcheck 的 exit code 为1时，job会失败。解决办法是 ( errcheck 2>&1 || true ) | reviewdog

在这个过程中，不断增加的检查机制, 并说明理由\目的

thanos 代码规范推荐的代码 linter 工具 go vet, 同时也推荐 golangci-lint, 但 golangci-lint 无法配置的原因, 将考虑一个个配置其默认的 linter , 建议参考Thanos 里配置的 linters

govet
errcheck
staticcheck
unused
gosimple
structcheck
varcheck
ineffassign
deadcode
typecheck

golint

errcheck

go vet

TODO

反复阅读代码评审规范. 不断增加或修正 linter

参考

·End·

3.7 - Gomonkey Test

简述

项目中重视单元测试，选择使用简单的第三库来实现 mock，是至关重要的，有太多的方法、依赖、全局变量需要 Mock。

各领风骚

gomonkey

gomonkey is a library to make monkey patching in unit tests easy

gomonkey should work on any amd64 system

…

只取一瓢

Tsung is an open-source distributed load testing tool that makes it easy to stress test websockets (as well as many other protocols.)，比如对 Websocket 长连接的压测^[3]

参考

[1] X86、X86_64 和 AMD64 的由来

时机落后的 Intel 开始支持 AMD64 的指令集，但是换了名字，叫x84_64，表示x86指令集的64扩展
x86_64, x64, AMD64 基本上是同一个东西

[2] [gomonkey 博主简书] (https://www.jianshu.com/u/1381dc29fed9) golang开发

[3] Gary Rennie: The Road to 2 Million Websocket Connections in Phoenix.

[4] eranyanay: Going Infinite, handling 1M websockets connections in Go

·End·

3.8 - Daily 0131 Golang 杂乱

golang, 还是学习吧

已经两个月没有更新了

guru for vim-go #TODO

Using Go Guru 详细介绍了 guru 功能点

vim go tutorial#guru 看到对 scope 的介绍，简单易懂的教程。

Pointer analysis scope: some queries involve pointer analysis, a technique for answering questions of the form “what might this pointer point to?”.

vim-go automatically tries to be smaart and sets the current packages import path as the scope for you.

go stack

了解 golang runtime 的堆栈信息，学习如何查看 golang stack trace

Go 堆栈的理解

堆栈跟踪，堆栈的参数
变量是在堆 ( heap ) 还是堆栈 ( stack ) 上

go list

如何列出依赖的外部 package List external dependencies of package

go list -f '{{join .Deps "\n"}}' |  xargs go list -f '{{if not .Standard}}{{.ImportPath}}{{end}}'

或者使用 deplist

go modules 410 Gone

原因：由于你的 go 库声明为 go1.12 格式，此时将没有 SUMDB 校验信息，因而在一个 go1.13 项目中引用这样的旧的库格式会产生校验错误，进而报错为 410 Gone。

解决办法：

对于使旧格式库的人来说，以下方式帮你顺利下载库和完成引用。

export GONOSUMDB="github.com/hedzr/errors,$GONOSUMDB"
## OR

export GOSUMDB=off

对于该库的拥有者而言，下面的办法是正确的处理方案，在 go.mod 中修改库宣告的版本格式为 1.13. 例如：

module github.com/hedzr/errors

go 1.13  // go 1.12

package main
import (
  "encoding/json"
  "net/http"
)
type Profile struct {
  Name    string
  Hobbies []string
}
func main() {
  http.HandleFunc("/", foo)
  http.ListenAndServe(":3000", nil)
}
func foo(w http.ResponseWriter, r *http.Request) {
  profile := Profile{"Alex", []string{"snowboarding", "programming"}}
  js, err := json.Marshal(profile)
  if err != nil {
    http.Error(w, err.Error(), http.StatusInternalServerError)
    return
  }
  w.Header().Set("Content-Type", "application/json")
  w.Write(js)
}

先来段 go 代码，上面代码是返回 JSON 数据，是不是郁闷，为什么还要定义 struct Profile，没有 python/js/ 等等灵活呀

再看下 GO 怎么读 JSON 数据的

package main

import (
	"encoding/json"
	"fmt"
	"net/http"
)

type test_struct struct {
	Test string
}

func parseGhPost(rw http.ResponseWriter, request *http.Request) {
	decoder := json.NewDecoder(request.Body)

	var t test_struct
	err := decoder.Decode(&t)

	if err != nil {
		panic(err)
	}

	fmt.Println(t.Test)
}

func main() {
	http.HandleFunc("/", parseGhPost)
	http.ListenAndServe(":8080", nil)
}

*** curl -X POST -d “{"test": "that"}” http://localhost:8080 *** 测试

Bolt是 GO 语言 key/value 存储数据库，

The API will be small and only focus on getting values and setting values. That’s it.

Bolt is currently used in high-load production environments serving databases as large as 1TB. Many companies such as Shopify and Heroku use Bolt-backed services every day.

db.Update(func(tx *bolt.Tx) error {
	b := tx.Bucket([]byte("MyBucket"))
	err := b.Put([]byte("answer"), []byte("42"))
	return err
})

更新数据库是不是写起来很费劲呀！

db.View(func(tx *bolt.Tx) error {
	b := tx.Bucket([]byte("MyBucket"))
	v := b.Get([]byte("answer"))
	fmt.Printf("The answer is: %s\n", v)
	return nil
})

读数据一样，就不能直接 SET／GET 吗？

用 Go 自带的 http 参考这个文章 How to use sessions in Go 写了 session 机制

技术选型：如何选择一个合适的 Go http routing 呢？那这篇文章告诉你答案：go http routing benchmark

This benchmark suite aims to compare the performance of HTTP request routers for Go by implementing the routing structure of some real world APIs. Some of the APIs are slightly adapted, since they can not be implemented 1:1 in some of the routers.

再 tornado 中异步无阻塞的执行耗时任务原理得看，run_on_executor 装饰器对传递进来的函数封装，用 io_loop. TODO

罗列 GO 资源：

Go perfbook Writing and Optimizing Go code

Go 中文资料杂乱，须整理成自己的知识网

fasthttp 这个性能，相比 GO 自带的 http，相差很远

ruby 点点滴滴，奇奇怪怪

[Ruby 中的 @ % # $ 等各种千奇百怪的符号的含义等』 (http://www.cnblogs.com/likeyu/archive/2012/02/22/2363912.html) @开始是实例变量、@@开始的变量是类变量 $ 开始的变量是全局变量，在程序的任何地方都可以引用。

若左边最后一个表达式前带号的话，将右边多余的元素以数组的形式代入这个带的表达式中。若右边没有多余元素的话，就把空数组代入其中

?! 两个符号，一个表示布尔值，另外一个表示需要注意的

参考

go modules 410 Gone https://juejin.im/post/5e0ec5d75188253aa20e858e
Go Modules with Private Git Repositories https://medium.com/cloud-native-the-gathering/go-modules-with-private-git-repositories-dfe795068db4

4 - Python

用于梳理 Python 相关的笔记

4.1 - Python 面试Problems

Python面试必须要看的15个问题

到底什么是Python? //考察语言特性
补充缺失的代码

def print_directory_contents(sPath):
    """
    这个函数接受文件夹的名称作为输入参数，
    返回该文件夹中文件的路径，
    以及其包含文件夹中文件的路径。

    """
    # 补充代码

你如何管理不同版本的代码？
下面代码会输出什么：

def f(x,l=[]):
    for i in range(x):
        l.append(i*i)
    print l

f(2)
f(3,[3,2,1])
f(3)

这两个参数是什么意思：*args，**kwargs？我们为什么要使用它们？
下面这些是什么意思：@classmethod, @staticmethod, @property？
递归和生成器（generator）的使用
简要描述Python的垃圾回收机制（garbage collection）。
将下面的函数按照执行效率高低排序。它们都接受由0至1之间的数字构成的列表作为输入。这个列表可以很长。一个输入列表的示例如下：[random.random() for i in range(100000)]。你如何证明自己的答案是正确的。


def f1(lIn):
    l1 = sorted(lIn)
    l2 = [i for i in l1 if i<0.5]
    return [i*i for i in l2]

def f2(lIn):
    l1 = [i for i in lIn if i<0.5]
    l2 = sorted(l1)
    return [i*i for i in l2]

def f3(lIn):
    l1 = [i*i for i in lIn]
    l2 = sorted(l1)
    return [i for i in l1 if i<(0.5*0.5)]

5 - Javascript

用于梳理 Javascript 相关的笔记

5.1 - javascript Number --- 再次结识

2013-01-27 javascript基本功

javascript类型划为两大类：原始类型（甭管这样的翻译是否规范，英文为primitive type) 和对象类型。原始类型又划为四大类：数值、字符串、布尔值，还有两个特殊的类型：null 和 undefined 。废话少说，直接进入主题：javascript number的几点

与其他语言相比，javascript number不同点

JavaScript does not make a distinction between integer values and floating-point values. All numbers in JavaScript are represented as floating->point values.

javascript number表达的数值是有限的，于是就有overflow, underflow。

Arithmetic in JavaScript does not raise errors in cases of overflow, underflow, or division by zero. (-)Infinity when overflow, (-)0 when underflow Division by zero is not an error in JavaScript: it simply returns infinity or negative infinity.

javascript number 中特殊的NaN

There is one exception, however: zero divided by zero does not have a well-defined value, and the result of this operation is the special not-a-number value, printed as NaN. NaN also arises if you attempt to

divide infinity by infinity

take the square root of a negative number

use arithmetic operators with non-numeric operands that cannot be converted to numbers


	Infinity // A read/write variable initialized to Infinity.
	Number.POSITIVE_INFINITY // Same value, read-only.
	1/0 // This is also the same value.
	Number.MAX_VALUE + 1 // This also evaluates to Infinity.
	Number.NEGATIVE_INFINITY // These expressions are negative infinity.
	-Infinity
	-1/0 
	-Number.MAX_VALUE - 1
	NaN // A read/write variable initialized to NaN.
	Number.NaN // A read-only property holding the same value.
	0/0 // Evaluates to NaN.
	Number.MIN_VALUE/2 // Underflow: evaluates to 0
	-Number.MIN_VALUE/2 // Negative zero
	-1/Infinity // Also negative 0
	-0

javascript NaN != NaN

The not-a-number value has one unusual feature in JavaScript: it does not compare equal to any other value, including itself. This means that you can’t write x == NaN to determine whether the value of a variable xis NaN. Instead, you should write x != x. That expression will be true if, and only if, x is NaN. The function isNaN()is similar. It returns trueif its argument is NaN, or if that argument is a non-numeric value such as a string or an object. The related function isFinite()returns trueif its argument is a number other than NaN, Infinity, or -Infinity.

.3-.2 == .1 & .2-.1 == .1
-0 === 0


    var zero = 0; // Regular zero
    var negz = -0; // Negative zero
    zero === negz // => true: zero and negative zero are equal 
    1/zero === 1/negz // => false: infinity and -infinity are not equal

这算是对之前文章http://hyvi.sinaapp.com/2012/10/09/javascript-nan/ 做了个补充。 twitter上的代码：

  
    [0,7,5,10,4,15,2,13,4,16,4,10,1].map(function(a){return this[a];},typeof("")+typeof(0)+NaN+"d.").join("")

5.2 - nodejs

nodejs 资料收集

DONE

用c++写node本地插件，hello-world. 编译构建使用node-gyp 其中对c++代码不太熟悉，好难懂的样子，回头学习下TODO
性能测试，群里聊到Siege， brenchmark.js， ab(apache benchmark)
API + Static Clients

对cors技术解决方案
对session实现的技术方案

TODO

CMD vs AMD 理解CommonJS与RequireJS

6 - Team

关于团队、打(ZHI)工(YE)人(REN)

6.1 - 如何构建基础库

简介

提供一个库，沉淀共性的功能点。

是Library，而不是 Framework。

有哪些内容呢？

参考1 Gitlab Labkit

LabKit is minimalist library to provide functionality for Go services at GitLab.

Correlation
Loggging
Masking
Metrics
Monitoring
FIPS
Tracing
ErrorTracking

参考2 go-zero

鉴权
加解密
日志记录
异常捕获
监控报警
数据统计
并发控制
链路追踪
超时控制
自动熔断
自动降载
缓存控制

参考3 Micro

Micro Architecture

Wrapppers are a form of middleware that can be used with go-micro services, They can Wrap both the Client and Server handlers

Breaker
endpoint
Monitoring
ratelimiter
service
trace
validator

参考4 Dapr

6.2 - 团队建设

贝尔宾的团队的九种角色

roles in team

类型	特征	团队贡献	不足
执行者实干家	保守、尽职、中规中矩、通达常理		灵活性一般，对新想法不敏感
ME, Monitor Evaluator	个性小心谨慎、聪明、拥有广阔视野，能顾全大局并作出最有利的判断		有时过于吹毛求疵或按规矩办事

对于团队成员而言，一个人可以同时具备多种角色，而且一个团队随着工作项目发展阶段的推进，某些角色的优先度也会发生变化

对以上角色进行归类，分别为执行团队任务活动的「行动导向型」、协调团队内外部人际关系的「人际导向型」、以及负责想法创意与提供专家智慧的「谋略导向型」角色。

roles cat

理解和驾驭团队发展的五个阶段

纵向团队和横向团队

高效团队的特点

共同目标
职责分明
共同的价值观
成员个人能力强
集体能力高
自我管理和自我激励
个性互补
能力互补
有效决策
相互支持
灵活性
系统思维，了解十加一的危与机
不断学习
不责备他人

未完待续…

组织能力的杨三角

“杨三角”理论框架图

“杨三角”由员工能力、员工思维模式和员工治理方式三个方面组成

参考

[1] R.梅雷迪思•贝尔宾（R. Meredith Belbin）: 团队角色

[2] 莱恩：【團隊管理】管理者想建立高績效團隊，先找到這9種成員－貝爾賓團隊角色理論(Belbin Team Roles)，你的團隊需要那些角色？

[3] MBA 智库百科：杨三角理论

[4] 杨国安：《组织能力的杨三角 - 企业持续成功的秘诀》

·End·

6.3 - 基于Git Flow规范分支和发版问题

背景

一个人一个小的微服务, 多个人一个微服务开发的时候，流程还是很要必要的

方式

Git Flow 和 pull request 结合

Standard-Version

参考

git-flow 备忘清单, 这大概是了解到清晰的flow了

·End·

6.4 - Scrum & Jira 使用问题记录

Scrum 结合 Jira

Story

Story的时间不能超过3天的时间。如果超过了三天，先拆解。
技术Story和Story必须拆分subtask，估时和工时在subtask上填写
所有的Story需要关联需求。
处于”测试中“单：根据测试种类处理
- ”开发人员测试“ ，创建测试subtask，开发人员部署测试环境，开发自测接口，测试完成后”点击测试通过“
- ”无需测试“，开发人员直接点击”测试通过“
- ”测试人员测试“，开发人员跟测试人员沟通测试完成时间，并跟进，直到测试人员完成并点击”测试通过“ ；当测试人员无时间测试时，开发人员改为修改subtask的assignee为自己，并自行测试。
处于”验收中“，可通过GT 找BA或者Leader进行验收。
当jira单处于”Done“完成状态时，可以不用做任何操作了，如果上线，联系leader创建运维单。
Story的时间超了，怎么处理？
- 在原来的Story的subtask上log上超出的时间。前提是这个subtask确实需要花这么时间。
- 新增subtask，在新的subtask上log上时间。前提是这个Story里还包含了一开始并没有考虑到其他的子任务。
- 新增Story，前提是存在一种情况，时间评估差距很大，这个Story的完成需要依赖于其他的Story。

Subtask

不创建无关的subtask
Story状态是否需要更改是根据subtask完成情况来定
- BA或者Leader对Story宣讲完成后， Story状态更改为 ”待开发“
- DEV对Story以及subtask有了解清楚并没有疑问后，Story状态更改为“待开发”
- subtask已经开始开发了，Story状态更改为“开发中”
- subtask里开发工作已经完成，Story状态更改为”待测试“
- subtask都完成后，Story状态更改为”待验收“

通用

使用明确的形容词，避免使用类似如下形容词：
- “相关的”
- “ 部分
- “
每日 log work。（如果你做不到每日更新，可以至少两天做一次更新）
Sprint 启动后 jira 单（技术 Story+Story）的增加修改删除由 scrum master 来操作，勿擅自操作。
周中会议前更新会议内容
不管是 Story、技术 Story、运维单还是 BUG 单，找 reporter 验收。
- 为什么 Story Owner 找 reporter 验收？正常逻辑是我做完了改状态就 OK 了，才符合流水化作业。
不做的单拒绝
Jira 单拆分时并不了解需求。

Sprint指标

Sprint不仅仅有完成率的指标，另外还有更多的指标可以了解Sprint的状态

迭代变更率： = 插入和移出任务数、计划任务数 * 100% （任务数仅限：story + Tech Story）
迭代投入比：迭代工作投入比 + 运维投入比（Jira 总投入时间 / 团队总数108）
需求积压：
平均开发周期：Story、Tech Story 在看板中启动开发到完成所耗的平均天数
迭代逾期关闭

运维

类似“线上问题和故障”

每天都会在产品和研发群发布”线上问题日报“，以报表形式展示每个业务团队的问题存量，以及这些问题的持续时长 — 来自有赞

针对运维单的问题，可以同样的思路来处理。运维单不处理最终积压的问题会在某个时刻点爆发。

TODO: 如何整理出这样的报告？

Bug 管理

“Bug 看板”中统一管理

Bug 流程

测试 Bug 必须关联到 Jira 模块、影响版本和解决版本，我们可以根据“模块”和“经办人”来统计某个版本中遗留的测试 Bug 存量分布。

Bug 描述标准模板

重现步骤、实际结果、期待结果、抓包数据

复盘

使用“海星图”， “KISS”或“做的不错的 / 应该做的更好的”方法进行复盘，复盘的改进措施会被录入到“复盘 Action 跟进看板”，每个 Action 必须是可执行的具体措施，且有一个主要负责人（JIRA 经办人）和完成日期（JIRA 到期时期）。

Scrum 之星

激励动机，利出一孔

即时激励四个原则
- 提倡什么，反对什么
- 注重精神激励，同时兼顾物质激励，前者为主
- 及时给员工反馈
- 公开透明、有理有据、有事实

定位

PM (Program Manager)

负责的事情有：

和客户交谈，组织用户调查，发现用户需求
了解和比较竞争对手的产品
怎么让软件变得可用 (Usable)、有用（Useful）
怎么改进团队的流程

更全面的任务有：

带领团队形成团队的目标 / 远景，把抽象的目标转化为可执行的、具体的、优美的设计
管理软件的具体功能的生命周期（需求 / 设想 / 设计 / 实现 / 测试 / 修改 / 发布 / 升级 / 迁移 / 淘汰）
创建并维护软件的规格说明，让它成为开发 / 测试人员及时准确的指导，而不是障碍
代表客户和用户的利益，主动收集用户反馈，预期用户新的需求。协调并决定各种需求的优先级
分析并带领其他成员形成对缺陷 / 变更需求的一致意见，并确保实施。
带领其他成员确保项目保持功能 / 时间 / 资源的合理平衡，跟踪项目进展，确保团队发布让客户满意的软件。
收集团队项目管理和软件工程的各种数据，客观分析项目实施过程中优缺点，推动项目成员秩序改进，从而提升士气。

PM 做开发和测试之外的所有事情

Program Manager vs Project Manager

Project Manager	Program Manager
是团队的行政领导者，带领大家在项目中工作	和大家平等工作，推动团队完成软件的功能
通常是团队外和外界打交道的唯一代表	一个团队可以有很多的 PM
对项目的功能有最后的决定权	和其他团队成员一起大形成决议
管事也管人	管事不管人
不一定做具体的工作	一定做具体的工作

为什么不让 PM 领导开发和测试人员，这样 PM 工作起来不是更梳理“ ?

如果 PM 得到团队成员的支持，会是怎样的呢？

成为项目流程的主人 — 驱动流程，组织会议，实践 Scrum，保证进度；代表团队向上级 / 伙伴团队 / 客户 / 市场部门报告项目进度；团队成员都乐意和你交流，你赢得了大家的尊重；你不用自己写一行代码，也同样可以积极影响项目和产品。

反之，如果得不到团队成员的支持？

你会在各种会议或流程中浪费大家的时间，发一些大家不读的 Status Mail，不能凝聚团队，形不成共识；你对团队的状态不太了解。也不能有效和准确的像有关方报告团队的情况并获得支持；你对行业和产品的发展方向把握不准，对项目和产品造成负面的影响。

软件设计和实现

从 Spec 到实现

一个开发人员拿到设计文档 (spec) 之后，他会做下面几个事情。

估计开发任务所需的时间
会试着歇一写快速原型的代码，看看效果会怎样。期间发现了若干问题，与 PM 沟通后，最终达成了一致意见。
在看到初始效果和了解实现的细节后，开始写**设计文档（Technical Spec、Design Document），写好之后，可以请同事一起来复审设计文档（复审可选，因为一般情况下任务都不大）
设计文档写好之后，按照设计文档写代码。在实现过程中，他又发现一些意想不到的问题，与 PM 沟通后，找到了解决方案。
写好代码后，对照设计文档和代码指南进行自我复审，重构代码。
创建或者更新单元测试
进行单元测试（不仅要自己创建或更新单元测试，还要通过整个模块或者系统的单元的测试）
得到一个可以测试的版本，交给相关的测试人员测试，或者在网上进行公开测试，如 A/B 测试等。
修复测试人员或者用户发现的问题，等到问题都被解决得差不多了，在请同事进行代码复审。
根据代码复审的意见修改代码，完善单元测试和相关的文档，然后把代码签入到代码仓库中。

Testing Process in Scrum

敏捷测试在实践中出现两种声音：

将测试与 Sprint 分离，看做是与开发截然分开的“下一个阶段”
测试作为 Sprint 的一部分，当 Sprint 结束时所有的测试工作也结束

前者带来的问题

导致在实践敏捷开发过程中遇到种种问题：要么是忽略了代码质量，导致在频繁的迭代过程中，每个迭代的问题层出不穷；
沿用原有的方法安排对系统的系统测试，导致测试团队疲于奔命，却总也赶不上开发所要求的的进度。

什么是敏捷软件测试 #TODO

敏捷开发中不把测试单独拿出来描述的原因，恰恰是在敏捷开发中，测试不再是一个单独的，和开发独立的过程，而是变成了驱动开发、衡量产出的主要手段，成为敏捷开发中所有工程师在工作时必须时刻考虑和实践的一个部分。

究竟什么是敏捷测试

敏捷测试是基于自动化测试的

详细Scrum的敏捷测试。

Product Backlog, 测试需要考虑客户的价值大小（优先级）\工作量基本估算外，需要认真研究与产品相关的用户行为模式，产品的质量需求，哪些质量特性时我们需要考虑的？
Sprint Backlog，需要明确具体要实现的功能特性和任务，作为测试，这个时候后特别关注“Definnition of Done”，任务完成的验收标准。
在每个Sprint实施阶段，主要完成完成Sprint backlog所定义的任务，这时出了TDD或单元测试之外，应该进行持续集成或者通常说的 BVT （build virification Test)。如果有专职的测试人员角色，一方面可以完成测试用例、集成测试框架，协助开发人员进行单元测试；另一方面可以按照针对新视线的功能进行更多的探索式测试，同事开发验收测试的脚本。如果没有专职的测试热源角色，这些事情也是要完成的。只是由整个团队来完成。
验收测试可以自动化测试工具万恒，但一般情况，不可能做到100%的自动化测试。

来自 Best Practices for Testing Process in Scrum

In Scrum, a cross functional Development Team has all the resources needed to complete a ‘Done’ Increment by the end of a Sprint.

Solution One

Breaking each selected product backlog item into many small subtasks which can be devlivered and tested.

Taking the help of testing team during integreation testing itself

the testing team starts writing test cases based on each task.

testing team helps dev team during the integration testing and they test each sub-story deployed in test enviroments.

Solution Two

Testing should be factored during sprint circle

As per definition Development Team are cross functional team and they should be able to test the solution, during the sprint cycle.

Solution Three

QA Team write test cases accourding to Acceptance criteria DoD for each Story

so Dev Team will take the responsibility of the testing for each cases in the story.

Test Cases will be part of DoR for the story

来自知乎：敏捷流程中测试如何开展

来自 Kaverjody: 我的测试之旅

问题：

每个迭代里，开测试计划评审会议，产生测试计划文档并得到批复的话，将会是一笔非常大的管理开销，而且每个测试计划的重复信息都很大。
缺陷追踪实践，记录版本的时间长，修改这个问题可能几分钟搞定了。
迭代开始时，测试人员却很空，空的无事可做，到了迭代结束的时候却是忙的不得了。

解决：

开发项是新提出的概念，将软件的规格说明书撰写、设计、实现和测试封装在一起，作为最小的原子化产品组件（Component）。原子化的意思是保持开发项之间的互相依赖在可以做到的最低水平；移除或重排任何开发项的时候，对其他开发项不产生（或产生最小的）影响。
在迭代开始前，先有技术报告或需求文档，由此而产生出开发项；然后是和以往的项目一样的入口阶段，确实项目日程并且生成相关的高阶文档，包括集成计划文档，项目计划文档，模块测试策略以及开发测试计划文档都在此时创建。
所有开发项相关的测试活动都在 Sprint 内完成，这些测试被称为 DIT（开发项测试），测试用例本身还是属于以往的功能测试级别。但是开发项的测试计划、测试执行、报告等一些列过程全部都要在一个 Sprit 中完成，测试用例的自动化比例未做硬性规定
项目成员主要分为开发和测试两类工程师，但是角色的定义并不是拿来当做不可以逾越的红色使用，必要的情况下，开发工程师也可以承担部分测试任务甚至整个人投入测试，或者测试工程师也会与开发一起，结对开发代码。
开发人员的工作安排会受到测试工作的影响，每日站会或者平时工作中，可能会发现软件不容易测试，就需要开发人员协助检查以及修改代码提高软件的可测试性。

参考

基于 JIRA 的产品需求全生命周期管理实践

The Role of Automated Build Verification Test (BVT) in Agile-DevOps Methodologies

<现代软件工程构建之法> by 邹欣

BVT & BAT（版本验证测试和版本验收测试）

·End·

6.5 - 《卓有成效的管理者》之决策

背景

任务繁多，决策是其中一项。
当遇到如何决策的时候，特别费脑，不知道如何决定，应该是不知道如何去决定的。缺少思路，被各种杂乱无章的问题围绕，跳不出来，也迟迟没有给出最终的决定。
而通过这边文章，Get一种方式，帮助理清楚思路，快速做出决策。

方法论

待补充

后续

时间管理也是一个问题，工作中事情并不是有计划的单一事项，而是各种无关的事情需要处理。
如果每天在这种杂乱的事情中周旋，时间长了必定没有任何特别的产出。
理想的状态是：保持一条重要事情的主线，其他事情围绕主线可慢可不做的开展。

·End·

6.6 - 《卓有成效的管理者》之我能贡献什么

背景

最近思考的一个问题之一：怎么让团队有明显的产出，虽然有这个问题，但又因为日常工作中的琐事太多，往往很少有时间去思考这个问题。时间久了慢慢忘记了，特此在此记录下来。

标题来自《卓有成效的管理者》by 彼得·德鲁克

方法论

WHAT

我能有什么贡献？

WHY

重视贡献是有效性的关键
重视贡献的管理者，其所作作为是与众不同的
重视贡献能挖掘工作中尚未发挥的能力

WHAT

贡献

管理者若想做点贡献，就必须在这三个方面下功夫。

直接成果
树立新的价值观以及对这些价值观的重新确认
培养和开发明天所需要的人才

WHO

作为管理者，能为这个团队做（或贡献）什么

作为团队中的成员，能为这个团队做（或贡献）什么？

HOW

如何使专业人员的工作卓有成效

知识分子有责任让别人了解自己。
“为了便于你为机构作出贡献，你需要我做些什么贡献？需要我在什么时候，以哪种形式，用什么方式来提供这些贡献”

正确的人际关系

相互沟通：

“我们的组织和我，期望你作出怎样的贡献? 我应该期望你做什么呢？如何使你的知识和能力得到最大的发挥？”

团队合作：强调贡献有助于横向的沟通，因此能够促成团队合作

“谁需要我的产出，并使它产生效益”

自我发展：个人能否有所发展，很大程度上在于你是否重视贡献

“我对组织能有什么最大的贡献”

培养他人：重视贡献的管理者启发他人寻求自我发展

管理者的标准是以需求任务为基准，要求很高，高度的期望，远大的目标，是具有重大冲击力的工作

有效的会议

从会议中得到什么，会议的目的是什么，应该是什么

·End·

6.7 - Team Topologies 团队拓扑

Conway’s law、Dunbar 鄧巴係數、Team Fist Mindset、Minimal Cognitive Load

起初，对"Cognitive Load" 一词非常好奇，又了解到在 Matthew Skelton 的《高效能团队模式》中介绍其与团队的建设有密切的联系，在查找书籍过程中，也有读后感的文章对此书进行了总结，比如 Team Topologies - 團隊優先思考模式，觉得非常好，得好好细读，一直放在浏览器标签上舍不得关闭，今早早起趁此赶紧消化下，顺便记录下来，往后查阅。

名字解释

文章涉及几个专业名词。

名词	解释
Conway’s Law	組織內團隊組成的架構，就會直接影響你的軟體系統架構會長成什麼樣子。因為團隊架構決定溝通模式，溝通模式就會影響軟體系統架構
Dunbar 鄧巴係數	团队中可以深入互相信任且 share working memory 的人数基本上大概是 5 个人左右，极限就是 15 人。而能互相信任的上限大概是 50 个人，当超过 150 人时就已经高过了社交认知的上限，就连要记住对方的名字都难。
Team Fist Mindset	团队优先思考模式，
Minimal Cognitive Load	話說我們每個人的 working memory 其實是很有限的，所以要慎選佔用我們記憶體的事物
谷仓效应	各部门就像一间「小公司」，各自为政、自负盈亏，只专注在自身的营运利益，而非整个企业的利益，最终导致整个组织功能失调、企业走向衰败。
Stream-aligned Team	organized around the flow of work and has the ability to deliver value directly to the customer or end user.
Digital platform	A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product.

Team Dynamics

Team dynamics describe the behavior relationship between the member of a group .

Team dynamics are the unconscious, psychological forces that influence the direction of a team’s behaviour and performance. They are like undercurrents in the sea, which can carry boats in a different direction to the one they intend to sail .

Google 内部研究，影响 Team Dynamics 的因素有哪些？

Team Size
Team Lifespan
Team Relationship
Team Cognition 团队的认知

好的团队基础就是小巧精干且长存的团队。5-9 人，固定的团队成员一起为了一个目标工作得时间至少一年以上。

Tunkman’s stages of group development

这个说明了为什么团队至少一年以上。从团队组成到真的可以产出绩效，至少经历以下四个阶段

Forming
Storming
Norming
Performing

来自 Principles of Management

Team Fist Mindset

团队优先思考模式，是什么？

团队对某个软件负责，并且持续的关注与改善
Daily 要承诺参与并且不要迟到
团队对于内部事务要持续讨论
专注于 Team Goal
帮助他人移除 blocking thing
Mentor 新人，相互帮助成长
避免争输赢的争论，要能包容探索各种可能性的言论

Minimal Cognitive Load （最小化的认知负担）

如果要让团队能以高效的模式运转，首先要以小巧精干的长期的团队为单元，并且限制或与减少不必要的沟通。随着系统越来越复杂，团队与团队之间要建立沟通的规范。

Intrinsic Cognitive Load
Extraneous Cognitive Load
Germane Cognitive Load

如何最小化认知负担？

好的 IDE 和 tool 或者是培训来降低 Intrinsic Cognitive Load
通过 SOP 或者专业的 Infra Team / DevOps Team 来帮助建设优化开发部署流程，消减工程师的 Extraneous Cognitive Load
让工程师更专注于产生价值的 Germane Cognitive Load，设计更好的系统来解决客户的问题。

另外，Three ways to reduce team cognitive load and improve flow^[2]:

Create well-defined team interaction patterns
Use independent,stream-aligned teams^[1]
Build the thinnest viable platform (TVP)

团队的拆分

按照康威定律，团队的切分与组织的沟通模式会决定你的系统架构

好的 API

好的 API 是团队间的好的沟通模式，如果没有可能造成谷仓效应。怎么定义好的 API^[5]：

OPENAPI 定义 API
使用文档 / Wiki
好的用户体验
版本 and testing approach
最佳实践和原则
Work Info （未来的路线和 bug 修复时间）
Communication preferences (when/how)

把其他团队当成顾客。

管理依赖

Track dependencies using simple tools and remove blocking dependencies

管理团队间交流

Consciously design inter-team communications using team interaction mode

好的文档

Overcommunicate using just enough written documentation

Digital platforms

Digital platforms^[6] are portfolios of technical products.
Developing a digital platform is a strategic decision and not to be taken lightly. Besides the direct financial considerations, digital platforms also exert pressure on the relationships within your organisation.
Digital platforms are force multipliers（火力加乘）, so there is a fine line between developing a competitive advantage and introducing a significant productivity blocker.

参考

[1] Organizing Agile Teams and ARTs: Team Topologies at Scale Overview

[2] Matthew Skelton and Manuel Pais: Forget monoliths vs. microservices. Cognitive load is what matters

[3] Justin Kitagawa: Platforms at Twilio: Unlocking Developer Effectiveness

[4] 阿贝好威： Team Topologies - 團隊優先思考模式

[5] Matthew Skelton and Manuel Pais: Are poor team interaction killing your devops transformation

[6] Cristóbal García García and Chris Ford: Mind the platform execution gap

7 - Tools

工欲善其事必先利其器

7.1 - Customize Icon for Plantuml

说明

记录如何定制 PlantUML 中的图标

操作

在mac下操作

处理图片
编码生成 Sprite
定义宏函数

处理图片

如果是png图片，则进行大小的裁剪，一般是48 * 48 px，如果非 png 图片，下面命令也会重新生成 48 像素的正方形图片。

qlmanage -t -s 48 -o . logo.svg

qlmanage 命令为 MAC 系统自带的

对 PNG 的图片编码

java -jar ~/Downloads/plantuml.jar -encodesprite 16z logo.svg.png

生成的代码如下;

sprite $logo [48x48/16z] {
pPLNWWCX34F7zTt_n5imtbHcbhyedGYGXR6FJuRw6-YIdhn5hkdXBuZL12VbtRWa_cuO5aeLv9sQE1O8ycAHwwrRv2gqyv6hrGJiZ6yWfn6Tko6BO1UCbPSh
1GR79S1Ulvw7_E2bTVQwktHo3s94Q7jwEpqA9b1_L2Oh0txBW21gbjiF1Y5gV-9tE5LpK9EuEQMrI_5oxkVJ5WLhfXEJfYBZDr3JiBAYZpz-PMMjwr34W40E
SD0Pc_P7JlwWPMOCbOuPQT0nUclErdCxOd354tLeqtDyb9uPB-tkj3H7g9MNmhqpGPQT-eEIIht5O4hPMHBwl9H2hHZatD4eo3olpWUD0JyiueSLUaXbWNg4
PHdZ_yqt2QdGz_9vyxxitiVD-xvVJ_RhrNuztA-t-_Lylr_izwFzVhVkfxBPhpyPtm
}

定义宏函数

定义宏函数，加入资源文件里，通过 !inlcude 的方式引入资源文件。这里是为了其他地方更方便的引用。

!define EMQX(_alias, e_label) rectangle "<color:black><$emqx></color>\r e_label" as _alias <<EMQX>>

最后

…记录下来而已

·End·

7.2 - 抓包工具分析之完全攻略

背景

抓包分析是调式前后端协议的杀手锏，用好工具节省大量的时间去写代码优化代码。

名词解释

HTTP Strict transport security(HSTS)
HTTP严格传输安全
HSTS禁止浏览器使用无效证书。

Certificate Transparency
为了解决CA存在的问题（故意或者恶意签发证书等），目的是提供一种开发的审计和监控系统，可以让任何域名所有者或者CA确定证书是否被错误签发或者被恶意使用，从而提供HTTPS网站的安全性。
how ct works

HTTP Public Key Pinning
用来防范由「伪造或不正当的手段获得网站证书」造成中间人攻击。
工作原理：通过响应头或者标签告诉浏览器当前网站的证书指纹，以及过期时间等其他信息.
Google已经针对不验证服务器证书的APP给出了警告，这些APP将来会有被Play store拒之门外的危险,参考

Chrome 69 版本开始移除对HPKP的支持

OCSP Stapling
OCSP(Online Certifacte Status Protocol, 在线证书状态协议)是用来检验证书合法性的在线查询服务。
TLS握手阶段，实时查询OCSP接口，并在获得结果前阻塞后续流程。但导致建立TLS连接时间变得更长。而 OCSP Stapling, 是服务器主动获取OCSP查询结果并随着证书一起发给客户端，从而让客服端跳过自己去验证的过程，提高TLS握手效率

工具

有fiddler, charles, wiresharks,

fiddler

使用中间人（man-in-middle）的方式来实现的。

本地化的工具，是一个使用本地127.0.0.1:8888 的HTTP代理。 ~任何能够设置HTTP代理为127.0.0.1:8888的浏览器和应用程序都可以使用Fiddler~

为什么不能代理所有的HTTP请求

因为在操作系统层面，没有“HTTP request”这一概念，只有TCP连接。
Contacting a HTTP proxy means changing the HTTP request slightly as well as contacting the proxy server instead of the host named in the URL.
所以这个逻辑是写在发送HTTP requests的软件代码里。
curl和wget有他们自己的实现HTTP Request的代码，并使用了自己的配置文件（-x选项）。两者都没有实现基于配置的逻辑，也没有使用Mac OS 系统提供的HTTP Libraries(这个库使用了代理设置）

charles

原理类似fiddler，但是mac上使用的简单的工具.

mitmproxy

原理是中间人的方式来实现, 再加个proxy, 中间人代理软件，可以用来拦截、修改、保存HTTP/HTTPS请求。

An interactive console program than allows traffic flows to be intercepted, inspected, modified and replayed. 优点是可自定制化开发，命令行模式，适合code geek和键盘控

Regurlar
Transparent
Reverse Proxy
Upstream Proxy
SOCKS Proxy

Modes of Operation

透明代理

重定向机制，可以将目的地为Internet上的服务器的TCP连接透明地重新路由到侦听代理服务器上。这通常采用与代理服务器相同的主机上的防火墙形式。比如Linux下的iptables\或者OSX中的pf。具体如何操作见参考中的"Mac 上使用mitmproxy对ios app进行抓包”

安装和使用

MitmProxy 使用教程 for MAC
更关心Transparent Proxying使用

Transparent Proxying 在Mac上实践

参考官方文档，对mac下进行全局抓包的尝试。如下：

Enable IP forwarding.

sudo sysctl -w net.inet.ip.forwarding=1

Place the following two lines in /etc/pf.conf.

rdr pass on en0 inet proto tcp to any port {80, 443} -> 127.0.0.1 port 8080

This rule tells pf to redirect all traffic destined for port 80 or 443 to the local mitmproxy instance running on port 8080. You should replace en0 with the interface on which your test device will appear.

rdr rules in pf.conf above apply only to inbound traffic. They will NOT redirect traffic coming from the box running pf itself.

Configure pf with the rules.

sudo pfctl -f pf.conf

Mac系统默认使用/etc/pf.conf，调式完之后需要重置

And now enable it.

sudo pfctl -e

Fire up mitmproxy.

You probably want a command like this:

mitmproxy --mode transparent  --showhost

The --mode transparent option turns on transparent mode, and the --showhost argument tells mitmproxy to use the value of the Host header for URL display.

Finally, configure your test device.

Set the test device up to use the host on which mitmproxy is running as the default gateway and install the mitmproxy certificate authority on the test device

到此, 可以抓包en0显卡上的流量, 但是抓包不了Mac本地上的流量。
但是这并没有解决抓包APP里HTTPS的流量问题，因为出现如下错误：
“ warn: 192.168.2.3:56243: Client Handshake failed. The client may not trust the proxy’s certificate for e.crashlytics.com. "
解决办法见： “破解SSL Pinning”

另外上述方法也没有办法抓包本机电脑上的流量；需要进一步设置： Work-around to redirect traffic originating from the machine itself

pf解决Mac自身流量抓包

##The ports to redirect to proxy
redir_ports = "{http, https}"

##The address the transparent proxy is listening on
tproxy = "127.0.0.1 port 8080"
##The user the transparent proxy is running as
tproxy_user = "nobody"

##The users whose connection must be redirected.
##
##This cannot involve the user which runs the
##transparent proxy as that would cause an infinite loop.
##

rdr pass proto tcp from any to any port $redir_ports -> $tproxy
pass out route-to (lo0 127.0.0.1) proto tcp from any to any port $redir_ports user { != $tproxy_user }

转发处理nobody之外的所有用户的流量到mitmproxy上。为了避免循环，所以以nobody用户身份来启动mitmproxy。

sudo -u nobody mitmproxy --mode transparent --showhost

** 发现有些流量不见了 ** 排查发现因为wifi下启用了socks代理，导致一些流量不见了, 转发到shadowsocks socks5代理去了。

使用socks5的方式抓包所有的流量

Tracing All Network Machine Traffic Using MITMProxy for Mac OSX
跟regular proxy一样，需要client/应用支持或者更改。比如chrome更改网络方式为代理模式。比如不能对Curl的请求抓包不了
同理，socks5也存在透明代理，不过实现的方式不一样, 比如tsocks

tsocks provides transparent network access through a SOCKS version 4 or 5 proxy (usually on a firewall). tsocks intercepts the calls applications make to establish TCP connections and transparently proxies them as necessary.

破解https的SSL Pinning TODO

APP上破解https的SSL Pinning

wireshark

抓取网卡上的所有TCP、UDP的数据

HTTPS的解密

通过私钥来解密, 咨询过运维，这种私钥是没办法提供的。参考这边文档： How to Decrypt SSL and TLS Traffic Using Wireshark
适合浏览器通过设置环境变量截取浏览器的pre_master_secret,进而实现解密HTTPS的目的。 wireshark两种解密https方式
也只适合浏览器，其他客户端发送出的请求无法解密 通过mitmproxy来获取SSLKEYLOGFILE，参考 Wireshark and SSL/TLS Master Secrets

This mechanism (SSLKEYLOGFILE) currently(2019) does not work for Safari, Microsoft Edge, and others since their TLS libraries (Microsoft SChannel / Apple SecureTransport) do not suppport this mechanism.
This mechanism works for applications other than web browser as will but it dependent on the TLS Libraries used by application. Examples of applications:

Applicaitons using OPENSSL conld use GDB or a LB_PRELOAD trick to extract the secrets .
For Java programs
Python scripts can be edited to dump keys as well

参考

在Trello上记录所有待办事项。
常用的HTTP抓包工具Fiddler之使用技巧
 三种解密HTTPS流量的方法
 杀手锏：如果让不支持代理的软件，通过代理进行联网
 如何使用透明代理抓HTTPS
Mac 上使用mitmproxy对ios app进行抓包比较详细的操作
怎么让charles能代理所有的http(s)的请求呢？
HTTP Public Key Pinning 介绍
 app 抓包利器.pdf

·End·

7.3 - SS 全军覆没，v2ray for Macos

简介

大国国庆，机会所有的ss挂了，查个个资料什么的确实不方便，一度使用了bing.com的“网页快照”来查看被墙的资料。
国庆期间，陪完家人，开始搬砖。
ss不能使用后，发现手机上v2ray 连上wifi时，是可以正常使用的（之前配置过服务器）.
这次配置电脑上的v2ray client, 之前安装过v2rayx, 但是使用时出现无法连接的问题，加上UI版的配置无心使用, 这次把命令行的方式献上

安装v2ray

brew tap qiwihui/v2ray   
brew install v2ray-core

没有被强

配置v2ray

参考 config.json
修改inbound.port，outbound里的address和port，users中的id和security

vim /usr/local/etc/v2ray.config.json

启动v2ray + SwitchyOmega

ss 也有用到SwtichyOmega，这次只需要启动v2ray

v2ray -config=/usr/local/etc/v2ray.config.json

开机启动的方式（亲测失败）

brew services start v2ray-core

问题汇总

Network is Unavailable

A: 真的重启了就好了。服务真的不稳定，一年挂机好多次。不过现在速度真的是快。

·End·

7.4 - fencview.vim + xshell + vim + 各种中文编码问题

fencview.vim

c++ 老代码都不是utf-8编码，估计是gb2312或gbk编码，而javascript是utf-8编码
咋办？
vim中只设置过
“set encoding=gb2312 termencoding=utf-8 “fileencoding=gbk

不同编码文件，怎么通过一个设置来搞定呢？疑惑！！！
以前碰到不同文件后缀名能用不同的高亮，不同的格式化。
按此逻辑。。。
不同的文件编码可以用不同的配置咯。

google 就如上帝！！！

发现了 http://edyfox.codecarver.org/html/vim_fileencodings_detection.html
最后提到了统一解决办法 fencview.vim

杜绝眼高手低：搞起

一个小时…
两个小时…
搞定

下载插件：fencview
配置.vimrc

set encoding=utf-8
set termencoding=utf-8 "fileencoding=utf-8
set fileencodings=utf-8,ucs-bom,gb18030,gbk,gb2312,cp936
let g:fencview_autodetect=1
"let g:fencview_auto_patterns='*'

最后一行注释掉：因为不注释javascript 文件不高亮。

7.5 - VIM 转 VSCODE

VIM 适合折腾 VS Code 适合高效率业务开发开始学习 VS Code 的快捷键

## 更新记录 2019.07.08 还在用 VIM... 有毒？

2021-10-24 继续使用，发现各种好特性，比如在一行中删除光标后所有字符到有括号")"，这些技巧提升了效率。以防忘记，统一在这里记录，便于翻阅。

常用的编辑快捷键

Editing

To delete forward up to character ‘X’ type dtX

To delete forward through character ‘X’ type dfX

To delete backward up to character ‘X’ type dTX

To delete backward through character ‘X’ type dFX

Aligning text with Tabular.vim :tab \/

历史

vim-go 为什么错误这么让人不知所措，比如：

gorename: can't find package containing

gometalinter: unkown linters: govet, typecheck, unsed, gosimple

--enable-all/--disable-all can not be combined

quickfix 没有显示出来，并且仅仅提示 GoMetaLinter Failed

解决办法有很多种，而在不断尝试过程中解决问题：

对 vim-go 配置详细了解，比如g:go_metalinter_enabled和g:go_metalinter_autosave_enabled
更换最新的版本，比如 vim-go 和 golangci-lint 的最新稳定版本，比如 gorename 最新版本并不支持 go modules 项目

不深入了解 vim-go 的原理，用不来 vim-go，期望：某天能跟 vscode golang 插件一样好用。不过，在使用 vim-go 的过程中，对静态代码检查工具有了更多了解，比如有对 golang 代码的安全检查：gosec。

Updated At Thu Jul 18 13:34:37 2019

参考

[1] vimcasts.org Learn Essential Vim Skills

[2] Casperfeng’s Github: mastering-vim, Tring to become a proficient vim user.

7.6 - 从Jekyll到hugo, 迁移Github Pages

背景

原因:

观察到一些技术blog把评论和github issue同步了(gitalk)，这个很赞。
后来想到还有一个原因：原来的主题真不好看（跟写blog关系不大呀)，强迫症犯了。

于是想在jekyll搭建的blog下增加这种评论，按照教程操作。理应是一两个小时的事情，而因为GWF和Mac下Ruby环境的折腾一个大晚上。
一觉后，不应浪费时间在这些网络问题上，blog主要是思考、怎么写、怎么表达。
于是选择hugo来迁移Blog。

实践

完全参考使用 Hugo + GitHub Pages 搭建个人博客
连风格也完全一(lan)样(ren)。。。
一天搭建完

后续

hugo有很多有趣的功能，比如shortcode, 前段时间一直想通过shortcode让blog支持流程图和思维导图
好处

现有的工具(xmind, 百度脑图等)生成，然后导出图片, 若更新图中内容时会比较麻烦.
目前想到简单的办法是将xmind的源文件和图片同名保存, 更新图片内容，更新源文件的同时，导出同名的图片并覆盖.

坏处

如果后续迁移到其他Blog架构，会出现因为不支持shortcode显示导致页面显示流程的代码。

结论：能不用shortcode的情况，换其他表达方式。

8 - Daily 杂乱记录

用于记录日常相关的笔记

8.1 - 子网掩码

名词解释

子网掩码

它就是拿来划分子网的，更准确的说，划分子网的同时，还能通过它知道主机在子网里面的具体ip的具体地址。

网络号（subnetwork）

表示我住哪个小区。

主机号(host)

表示我家门牌号是多少。

CIDR(无类别域间路由)/VLSM（可变长子网掩码）

比如： 192.168.0.0/24

更多 CIDR / VLSM 例子

参考

[1] 知乎 noopsphere: 什么是子网掩码

·End·

8.2 - OAuth 2.0 Token Exchange协议解读

S3、OSS使用了STS，但是对其原来并不了解。通过这篇文档梳理对STS实现有一定的了解.

名词解释

security token: a set of information that facilitates the sharing of identity and security information in heterogeneous environments or across security domains. Examples of security tokens include JSON Web Tokens (JWTs) and Security Assertion Markup Language (SAML) 2.0 assertions. Security tokens are typically signed to achieve integrity and sometimes also encrypted to archive confidentiality.

实践

需要找一个实现的开源代码看看.

参考

OAuth 2.0 Token Exchange： a protocol extending OAuth 2.0 that enable clients to request and obtain security tokens from authorization servers acting in the role of an STS . https://tools.ietf.org/html/rfc8693

·End·

8.3 - OIDC（OPENID CONNECT）身份认证授权

只有一张图！

OIDC = （identity，authentication） + OAuth2.0，在OAuth2.0 上构建了一个身份层，是一个基于OAuth2.0协议的身份认证标准协议。
OIDC应用场景

再来一张图吧！基于OAuth的认证与身份协议的各个组件

名词解释

RP: 在新的协议语境中，客户端叫依赖方，或者叫 RP

IdP: 从概念上将手段服务器和受保护资源合并为身份提供方

ID令牌：用于携带有关身份认证事件本身的信息。解决不同身份提供者的协议各不相同的问题。

身份认证

资源拥有者要在授权服务器上授权端点上进行身份认证为什么有？想想。“忽略了一个环节：对资源拥有者进行身份认证”。
客户端要在授权服务器的令牌端点进行身份认证
最后，基于OAuth实现的身份认证 - OpenID Connect

发现协议

openid-connect discovery 动态服务器发现，客户端需要知道 IdP的发布者URL。

可以直接配置，比如NASAR
也可以基于WebFinger协议来发现发布者。

客户端注册协议

可以昂客户端向新的身份提供者注册。与OAuth动态客户端注册协议扩展并行，两者是相互兼容的，参考OAuth动态客户端注册扩展。

如果客户端需要访问的API是由多个不同的服务器提供的。
如果客户端软件有多个实例，每个实例都需要与同一个授权服务器交互。

不同的OpenID Connect客户端

Authorization Code Flow

Implicit Flow

在Successful Authentication Response里返回id_token和token, 而不是code

Hybrid Flow

区别在与reponse_type 可以为 code id_token, code token or code id_token token
不同的返回适用于什么样的场景了？

其他

Access Token

JSON Web Token(JWT) Profile for OAuth2.0 Access Tokens
定义了Token的Data Structure，以及发布和消费Access Token具体的内容

实践

Kong OpenID Connect 支持多种授权流程.罗列下常见的流程

Session Authentication
JWT Access Token Authentication
User Info Authentication
Introspection Authentication
Authorization Code Flow

参考

OIDC 身份认证授权
http://www.csharpkit.com/2017-09-23_58568.html
OpenID Connect Core 1.0 incorporating errata set 1
https://openid.net/specs/openid-connect-core-1_0.html
User Authentication with OAuth 2.0 https://oauth.net/articles/authentication/
OAuth 2.0 实战在web.kamiapp.com里
OpenID Connect Plugin https://docs.konghq.com/hub/kong-inc/openid-connect/

·End·

8.4 - Io wait 告警问题

%iowait 表示在一个采样周期内有百分之几的时间属于以下情况：CPU空闲、并且仍未完成的I/O请求。

Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

两个误解：

%iowait 表示 CPU 不能工作的时间
%iowait 表示 I/O 有瓶颈

参考：

理解%IOWAIT(%WIO)

·End·

8.5 - Daily 0201

docker-compose

docker-compose 在同一个目录下启动两个容器，不出现覆盖的情况

docker-compose -p dev  up
docker-compose -p test up

很方便了

How do I run the same docker-compose.yml several times on same docker daemon with different names?

Iterm 2

前进后退一个单词是不是伤老壳，终于解决了

vim中使用ESC + b/f 来前进或者后退一个单词，ESC真远
设置profile - key来搞定一切 Option + b/f

参考文档 item2常用快捷键总结

8.6 - Daily 0321

mysql character_set

show variables like ‘character%’;
从my.ini下手（标签下没有的添加，有的修改）

      [client]
       default-character-set=utf8
      [mysql]
       default-character-set=utf8
      [mysqld]
      default-character-set=utf8

Set character_set_database = utf8 重启失效

unicodeencodeerror-latin-1-codec-cant-encode-character

gerrit admin and h2 database

问题： gerrit admin没有实际的权限

Since UUID from groups file and db table account_groups were different I;ve updated table with correct hashes from the group file and added my accont to Administrators group.

e.g. update account_groups set group_uuid=‘9d3ab2b20b498a1793e2e6112d7bdcb01b852588’, owner_group_uuid=‘9d3ab2b20b498a1793e2e6112d7bdcb01b852588’ where group_id=1; INSERT INTO account_group_members (account_id, group_id) VALUES ( “1”, “1”);

Admin group member cannot create

问题：怎么进入h2数据库

java -cp  h2-1.3.175.jar  org.h2.tools.Script -url jdbc:h2:/home/gerrit/db/ReviewDB

查找h2的jar包，然后找到ReviewDB的数据库文件位置

docker服务器时间同步

安装tzdata软件(apt-get install -y tzdata)
ENV TZ America/Los_Angeles

8.7 - Daily 0503

vCPU的解释

虛擬主機（Virtual Machine，VM）的CPU稱之為vCPU，當虛擬主機需要CPU運算資源的時候，VMkernel會將此虛擬主機需要的運算資源對應（Mapping）到實體伺服器的CPU核心運算HEC（Hardware Execution Context）能力，以使得虛擬主機得以進行運算。簡單來說，HEC就是實體伺服器的CPU核心數（Cores）。

所以，如圖5所示當虛擬主機配有1 vCPU，在需要運算資源時，只要VMkernel對應到實體主機上其中一個HEC就可以執行運算；若虛擬主機配有2 vCPU，在需要運算資源時，則必須對應到2個HEC才能運算；若4 vCPU則要對應4個HEC才能運算。

虛擬主機vCPU與實體伺服器HEC對應更多内容

google python style guide

Python 是 Google主要的脚本语言。这本风格指南主要包含的是针对python的编程准则。
中文版Git

elk

更新license: Update License API
注册之后，可以使用xpack, 后台日志不会重复打

logstash_1       | [2018-05-03T11:31:00,614][INFO ][logstash.licensechecker.licensereader] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://logstash_system:xxxxxx@elasticsearch:9200/, :path=>"/"}
logstash_1       | [2018-05-03T11:31:00,615][WARN ][logstash.licensechecker.licensereader] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://logstash_system:xxxxxx@elasticsearch:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '401' contacting Elasticsearch at URL 'http://elasticsearch:9200/'"}
logstash_1       | [2018-05-03T11:31:03,796][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://logstash_system:xxxxxx@elasticsearch:9200/, :path=>"/"}
logstash_1       | [2018-05-03T11:31:03,797][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://logstash_system:xxxxxx@elasticsearch:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '401' contacting Elasticsearch at URL 'http://elasticsearch:9200/'"}

elastalert

docker elastalert

FROM python:2.7

WORKDIR /
RUN git clone https://github.com/Yelp/elastalert.git
RUN cd  elastalert && git checkout v0.1.30 && pip install -r requirements.txt


WORKDIR /elastalert/
RUN ls -l 

RUN apt-get install -y tzdata
ENV TZ Asia/Shanghai

CMD ["python", "elastalert/elastalert.py"]

参考文档：
Using ElastAlert
ElastAlert: Alerting At Scale With Elasticsearch 各种异常机制

index_not_found_exception 定位python报错行, 问题在于： elastalert判断elastsearch存在index，而实际上并不存在
github elastert isssue 1
解决办法：
create_index.py 244行注释掉delete函数

elasticsearch

some Elasticsearch terminology: an Elasticsearch cluster is made up of one or more nodes. Each of these nodes contains indexes which are split into multiple shards. Elasticsearch makes copies of these shards called replicas. These (primary) shards and replicas are then placed on various nodes throughout the cluster.
更多内容

Elasticsearch 中写一致性原理以及quorum机制

[查看详情](Elasticsearch 中写一致性原理以及quorum机制)

Elasticsearch 5 docker 集群部署–单虚拟机多容器实例

实践

基于《Elasticsearch: The Definitive Guide》的笔记



GET /my_index/_search
{
  "query": {
    "match_phrase": {
      "names": "Lincoln Smith"
    }
  }
}

GET /_analyze
{
  "analyzer": "standard",
  "text": [
    "John Abraham",
    "Lincoln Smith"
  ]
}

GET /my_index/_doc/9 

PUT /my_index/_doc/9  
{
  "names": ["John Abraham", "Lincoln Smith"]
}


GET /my_blog/_search
{
  "query": {
    "match_phrase": {
      "body": "quick brown fox"
    }
  }
}

GET /my_blog/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "Quick pets"
          }
        },
        {
          "match": {
            "body": "Quick pets"
          }
        }
      ],
      "tie_breaker": 0.7
    }
  }
}



GET /my_blog/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "Brown fox"
          }
        },
        {
          "match": {
            "body": "Brown fox"
          }
        }
      ]
    }
  }
}




# 返回了并不是用户想要的结果 
GET /my_blog/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "Brown fox"
          }
        },
        {
          "match": {
            "body": "Brown fox"
          }
        }
      ]
    }
  }
}



PUT /my_blog/_doc/2
{
  "title": "Keeping pets healthy",    
  "body":  "My quick brown fox eats rabbits on a regular basis."
}

PUT /my_blog/_doc/1
{
  "title": "Quick brown rabbits",    
  "body":  "Brown rabbits are commonly seen."
}


PUT /my_blog



GET /my_index/_search
{
  "query": {
    "match": {
      "title": {
        "query": "BROWN DOG",
        "operator": "and"
      }
    }
  }
}

GET /my_index/_search 
{
  "query": {
    "match": {
      "title": "QUICK!"
    }
  }
}


POST /my_index/_bulk
{ "index": { "_id": 1 }}
{ "title": "The quick brown fox" }
{ "index": { "_id": 2 }} 
{ "title": "The quick brown fox jumps over the lazy dog" }
{ "index": { "_id": 3 }}
{ "title": "The quick brown fox jumps over the quick dog" } 
{ "index": { "_id": 4 }} 
{ "title": "Brown fox brown dog" }


# 为什么 shard 为 1 
PUT /my_index
{
  "settings": {
    "number_of_shards": 1
  }
  
}


DELETE /my_index 


GET /my_store/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "productID": "KDKE-B-9947-#kL5"
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "price": "30"
                }
              },
              {
                "term": {
                  "productID": {
                    "value": "JODL-X-1937-#pV7"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}


DELETE /my_store

GET /my_store/_search
{
  "query": {
    "bool": {
      "must": [
        {
        "match_all": {}
        }
      ], 
      "filter": [
        {
          "term": {
            "productID": "XHDK-A-1293-#fJ3"
          }
        }
      ]
    }
  }
}


PUT /my_store 
{     
  "mappings" : {         
      "properties" : {                 
        "productID" : {                     
          "type" : "keyword"
        }            
        }        
      }     
    }


POST /my_store/_doc/_bulk 
{ "index": { "_id": 1 }} 
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3" } 
{ "index": { "_id": 2 }} 
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5" } 
{ "index": { "_id": 3 }} 
{ "price" : 30, "productID" : "JODL-X-1937-#pV7" } 
{ "index": { "_id": 4 }} 
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8" }



GET /us-tweet/_search
{ 
  "query": {
    "match_all": {}
  }
  , "_source": ["tweet", "date"]
}

GET /us-tweet

GET /us-tweet/_search?search_type=dfs_query_then_fetch

# explain API to understand why one particular document matched or, more important, why it didn't match .
GET /us-tweet/_doc/12/_explain 
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "user_id": "3"
        }
      },
      "must": 
        {
          "match": {
            "tweet": "honeymoon"
          }
        }
      
    }
  }
}

GET /_search?explain=true
{
  "query": {
    "match": {
      "tweet": "honeymoon"
    }
  }
}

GET /_search
{
  "query": {
    "bool": {
      "must": { "match": {
        "tweet": "managee text search"
        } 
      },
      "filter": [
        { "term": {
          "user_id": "2"
          }
        }
      ]
    }
  }
}



GET /us-tweet/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "user_id": 1
        }
      }
    }
  },
  "sort": {
    "date": {
      "order": "desc"
    }
  }
}


GET /_search

PUT /us-user
PUT /gb-user
PUT /gb-tweet
PUT /us-tweet

POST /_bulk
{"create":{"_index":"us-user","_id":"1"}}
{"email":"john@smith.com","name":"John Smith","username":"@john"}
{"create":{"_index":"gb-user","_id":"2"}}
{"email":"mary@jones.com","name":"Mary Jones","username":"@mary"}
{"create":{"_index":"gb-tweet","_id":"3"}}
{"date":"2014-09-13","name":"Mary Jones","tweet":"Elasticsearch means full text search has never been so easy","user_id":2}
{"create":{"_index":"us-tweet","_id":"4"}}
{"date":"2014-09-14","name":"John Smith","tweet":"@mary it is not just text, it does everything","user_id":1}
{"create":{"_index":"gb-tweet","_id":"5"}}
{"date":"2014-09-15","name":"Mary Jones","tweet":"However did I manage before Elasticsearch?","user_id":2}
{"create":{"_index":"us-tweet","_id":"6"}}
{"date":"2014-09-16","name":"John Smith","tweet":"The Elasticsearch API is really easy to use","user_id":1}
{"create":{"_index":"gb-tweet","_id":"7"}}
{"date":"2014-09-17","name":"Mary Jones","tweet":"The Query DSL is really powerful and flexible","user_id":2}
{"create":{"_index":"us-tweet","_id":"8"}}
{"date":"2014-09-18","name":"John Smith","user_id":1}
{"create":{"_index":"gb-tweet","_id":"9"}}
{"date":"2014-09-19","name":"Mary Jones","tweet":"Geo-location aggregations are really cool","user_id":2}
{"create":{"_index":"us-tweet","_id":"10"}}
{"date":"2014-09-20","name":"John Smith","tweet":"Elasticsearch surely is one of the hottest new NoSQL products","user_id":1}
{"create":{"_index":"gb-tweet","_id":"11"}}
{"date":"2014-09-21","name":"Mary Jones","tweet":"Elasticsearch is built for the cloud, easy to scale","user_id":2}
{"create":{"_index":"us-tweet","_id":"12"}}
{"date":"2014-09-22","name":"John Smith","tweet":"Elasticsearch and I have left the honeymoon stage, and I still love her.","user_id":1}
{"create":{"_index":"gb-tweet","_id":"13"}}
{"date":"2014-09-23","name":"Mary Jones","tweet":"So yes, I am an Elasticsearch fanboy","user_id":2}
{"create":{"_index":"us-tweet","_id":"14"}}
{"date":"2014-09-24","name":"John Smith","tweet":"How many more cheesy tweets do I have to write?","user_id":1}
# Don't Repeat Yourself
# 出错，从 7.0 开始，一个索引下一个类型
POST /website/_bulk
{ "index": {"_type": "blog" } }
{ "title": "User logged in" }

GET /website/
# 不要其他的元数据
GET /website/blog/123/_source

# 检索文档的一部分 
GET /website/blog/123?_source=title,text

GET /website/blog/123?pretty

POST /website/blog/
{
  "title": "My second Blog entry"  , 
  "text": "still tying this out .... ", 
  "date": "2014/01/02"
  
}

PUT /website/blog/123
{
  "title": "My First Blog entry"  , 
  "text": "just tying this out .... ", 
  "date": "2014/01/01"
  
}

PUT /website

GET _search
{
  "query": {
    "match_all": {}
  }
}

PUT /megacorp

PUT /megacorp/employee/1 
{     "first_name" : "John",     "last_name" :  "Smith",     "age" :        25,     "about" :      "I love to go rock climbing",     "interests": [ "sports", "music" ] 
  
}

PUT /megacorp/employee/2 
{     "first_name" :  "Jane",     "last_name" :   "Smith",     "age" :         32,     "about" :       "I like to collect rock albums",     "interests":  [ "music" ] }

PUT /megacorp/employee/3 
{     "first_name" :  "Douglas",     "last_name" :   "Fir",     "age" :         35,     "about":        "I like to build cabinets",     "interests":  [ "forestry" ] }

GET /megacorp/employee/1

GET /megacorp/employee/_search 

GET /megacorp/employee/_search?q=last_name:Smith 

# DSL 查询
GET /megacorp/employee/_search 
{
  "query": {
    "match": {
      "last_name": "Smitch"
    }
  }
}


# 年龄大于30岁的员工。

GET /megacorp/employee/_search 
{
  "query": {
    "bool":{
      "filter": {
        "range": {
          "age": { "gt": 30 }
        }
      },
      "must": {
        "match": {
          "last_name": "smitch"
        }
      }
    }
  }
}

GET /megacorp/employee/_search 
{
  "query": {
    "match": {
      "about": "rock climbing"
    }
  }
}


GET /megacorp/employee/_search 
{
  "query": {
    "match_phrase": {
      "about": "rock climbing"
    }
  }
}

# 因为我要聚合的字段 「interests」没有进行优化，也类似没有加索引 
# 没有优化的字段 es 默认是禁止聚合/排序操作的。 
PUT /megacorp/_mapping?pretty
{
  "properties": {
    "interests": {
      "type": "text", 
      "fielddata": true
    }
  }
}

# 让我们找到所有职员中最大的共同点是什么？
GET /megacorp/employee/_search
{
  "aggs":{
    "all_interests": {
      "terms": {"field": "interests" }
    }
  }
}


GET /megacorp/employee/_search 
{
  "query": {
    "match": {
      "last_name": "smith"
    }
  }, 
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}

GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }, 
      "aggs": {
        "avg_age": {
          "avg": {"field": "age"} 
        }
      }
    }
  }
}


PUT /blogs
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}


GET /_cluster/health

nginx配置

(location =) > (location 完整路径) > (location ^~ 路径) > (location ~,~* 正则顺序) > (location 部分起始路径) > (/)
更多详情

tornado

sqlalchemy 和 tornado的结合： session管理放在了每次的request的请求中处理为最佳，以及每次请求经来时，实例化session, 请求结束后, 将session关闭
scoped_session session 注册表，从中取用和还取，并保证多次取用的为统一session
此处的sqlalchemy的数据库查询，并不是异步，当使用tornado 的异步特性时，遇到查询数据库慢时，还是会阻塞的，此时我们更多的需要考虑的

我心中的 tornado 最佳实践

handle blocking tasks in Tornado
其中Ben Darnell回复的,

A ThreadPoolExecutor is the recommended way to use blocking functions that cannot be easily rewritten as non-blocking. When you call yield self._exe(n), that handler will be suspended and the main thread will return to the HTTPServer so it is free to handle other requests. The suspended handler will wake up when the task is completed and the IOLoop is not busy. The new thread pool is created by the main thread but is not “in” the thread. Thread pools should generally be either global or class variables; not instance variables. It is a good practice to have one thread pool for each kind of resource: e.g. one thread pool for database queries and second pool for image processing. This lets you set and monitor separate limits for each one.

import tornado.web
from tornado.concurrent import run_on_executor
from concurrent.futures import ThreadPoolExecutor
import time

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world %s" % time.time())


class SleepHandler(tornado.web.RequestHandler):
    @property
    def executor(self):
        return self.application.executor

    @tornado.gen.coroutine
    def get(self, n):
        n = yield self._exe(n)
        self.write("Awake! %s" % time.time())
        self.finish()

    @run_on_executor
    def _exe(self, n):
        """
        This is a long time job and may block the server,such as a complex DB query or Http request.
        """
        time.sleep(float(n))
        return n


class App(tornado.web.Application):
    def __init__(self):
        handlers = [
                        (r"/", MainHandler),
                        (r"/sleep/(\d+)", SleepHandler),
                    ]
        tornado.web.Application.__init__(self, handlers)
        self.executor = ThreadPoolExecutor(max_workers=60)


if __name__ == "__main__":
    application = App()
    application.listen(8888)
    tornado.ioloop.IOLoop.instance().start()

python

MD，最喜欢的还是贴好文章的链接
PYTHON中YIELD的解释: 这个代码解释详细了，什么是“鸭子类型" (duck typing)

再来一个链接: Iterables vs. Iterators vs. Generators 原来generators还有两种类型(Type): generator functions and generator expressions

这真是一个很长的story啊

futrue使用
Futrue模式的主要使用场景：当前线程需要依赖另一线程的返回数据并且处理数据的线程又相当耗时，那么Futrue模式就可以使主线程提交数据请求给另一线程处理业务逻辑，等需要时将另一线程返回，很好的利用了等待时间
Python concurrent.future 使用教程及源码初剖

yield 协程
yield并没有指定当前进程要将执行权利移交给谁，只是放弃运行权利，至于下面由谁来运行，完全看进程调度schedule();多用于I/O等待时，进程短暂wait，但是并没有退出运行队列。
进程管理之yield
[ Yield 和 Coroutine ] (http://wsfdl.com/python/2016/11/13/yield_and_croutine.html)

Python (at least in the CPython implementation) has a Global Interpreter Lock which prevents multiple threads from executing Python code at the same time. In particular, anything which runs in a single Python opcode is uninterruptible unless it calls a C function which explicitly releases the GIL. A large exponentation with ** holds the GIL the whole time and thus blocks all other python threads, while a call to bcrypt() will release the GIL so other threads can continue to work.

深入理解 GIL：如何写出高性能及线程安全的 Python 代码介绍了GIL

参考

[1] Clinton Gormley and Zachary Tong: Elasticsearch: The Definitive Guide

8.8 - Daily 0607

python import

Python导入模块的几种姿势文中介绍了如下几种导入：

常规导入(regular imports)
使用from语句导入
相对导入(relative imports)
可选导入(optional imports)
本地/局部导入(local imports)
导入注意事项：循环导入(circular imports) 和覆盖导入(shadowed imports)

8.9 - Daily 0612

使用pip packege拆分项目

pip 安装git仓库

pip install from git repo branch 介绍几种导入安装方式

为什么不使用git submodule方式来拆分项目

更新子项目的方式使运维工作变得麻烦。

8.10 - Daily 0813

区块链

参考资料

区块链和 HyperLedger 系列课程（共十讲

8.11 - Daily 0821 chrome app/extension

入门

深入

扩展

参考

app 创建第一个应用
 disable cookies

8.12 - Daily 10/09

学习

reset.css 重置移动设备的css

8.13 - Daily 10/15 配置https

阿里云免费签发的CA证书，然后下载到服务器，配置nginx

  ssl_certificate   cert/213970811020013.pem;
  ssl_certificate_key  cert/213970811020013.key;

对80端口的域名访问做好跳转

server {
	listen 80;
    server_name server.com www.server.com;
    return 301 https://www.server.com$request_uri;
}

8.14 - Daily 10/16

学习

UUID

How to create a GUID / UUID in Javascript? 很全面的UUID资料

VUE 统一的数据请求接口

shop.js 接口案例项目结构

Web

html5 控制 andorid端 input输入框不弹出输入法？前端input 不弹输入框

1）onfocus='this.blur();'

2）readOnly='readOnly'

Dynamic component click event in Vue 动态组件添加click事件，使用@click.native

所不知的 CSS ::before 和 ::after 伪元素用法多看几次TODO

去除inline-block元素间间距的N种方两个元素之间没有任何元素，却有空格的解决办法，是inline-block本身存在的问题

postcss postcss-px-to-viewport 解决设计稿到代码的像素的转化 TODO没有webpack加入方式

再聊移动端页面的适配移动页面的适配最新解决方式

如何和何时使用CSS的!important 解释了为什么使用 !important

light7 优秀的移动端框架

cubic 很酷 TODO

purifycss 去掉无用的css

vue select 下拉选项vue集成

vue demo 结构清晰，耦合很小

团队编码规范参考学习

Vtiger

vtiger Webservice tutorials api修改crm数据

8.15 - Daily 10/18

学习

Eslint

enforce the consistent use of either backticks, double, or single quotes 禁用quotes规则，在文件头

/* eslint-disable quotes */

8.16 - Daily 10/19 三个月

学习

webpack dev环境下使用中间件http=proxy-middleware
通过配置参数option.onProxyReq来设置

function onProxyReq(proxyReq, req, res) {
    // add custom header to request
    proxyReq.setHeader('x-added', 'foobar');
    // or log the req
}

node-http-proxy 很多代理的示例，参考用

httpProxy.createProxyServer({
  target: 'https://google.com',
  agent  : https.globalAgent,
  headers: {
    host: 'google.com'
  }
}).listen(8011);

https->http 或者 https->https 配示例

Qs

qs 处理http参数的库，用上了很方便

Vue

编程式的导航

router.push({ path: 'home' })

自定义指令

// 注册一个全局自定义指令 v-focus
Vue.directive('focus', {
  // 当绑定元素插入到 DOM 中。
  inserted: function (el) {
    // 聚焦元素
    el.focus()
  }
})

ESLint

Rules

CSS 切图

页面制作（切图）第一章从sketch切图

从视觉到App：网易有钱iOS项目切图与适配实践这个项目过程，谁做什么事情很清楚

为何sketch预置画布尺寸比真实分辨率小？ md再看

Retina屏的移动设备如何实现真正1px的线？ TODO

中华人民共和国行政区划：省份、城市、区县、乡镇（街道）很齐全呀

8.17 - Daily 10/24

学习

Vue

[form validation表单验证](https://www.zhihu.com/question/37099220）使用va.js 方便验证

自定义指令各种参数说明

CSS+HTML

HTML5页面滑动到最底部触发内容加载 javascript实现

文档高度这是整个页面的高度可视窗口高度这是你看到的浏览器可视屏幕高度滚动条滚动高度滚动条下滑过的高度

CSS: 解决Div float后，父Div无法高度自适应的问题使用clear:both解决

js

箭头函数没有它自己的this值，箭头函数内的this值继承自外围作用域。
深入浅出ES6（七）：箭头函数 Arrow Functions

8.18 - Daily 10/30

学习

Vue

vue中input绑定回车事件

@keyup.13="search"

javascript 验证身份的有效性

CSS

:first-child 使用:first-child伪类时一定要保证前面没有兄弟节点，把h1去掉就可以；或者使用div包住article，然后css：div article:first-child

移动端高清、多屏适配方案 1)Retina下图片高清问题, 2)retina下，border: 1px问题,3)多屏适配布局问题,

python

python 命令行传参，避免敏感信息硬编码在代码中

import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
args = parser.parse_args()

split 第一个参数默认值、所有的空字符(空格、换行（\n)、制表符(\t), 第二个参数表示分隔次数

str = "Line1-abcdef \nLine2-abc \nLine4-abcd";
print str.split( );
print str.split(' ', 1 );

多行匹配模式
这个问题很典型的出现在当你用点(.)去匹配任意字符的时候，忘记了点(.)不能匹配换行符的事实。
re.compile() 函数接受一个标志参数叫 re.DOTALL

comment = re.compile(r'/\*(.*?)\*/', re.DOTALL)

Gerrit

Gerrit代码审核服务器详细的文章介绍 Gerrit代码审查-简介

git仓库导入到gerrit

I imported many GIT projects to gerrit, the easiest way I found was to copy the xy.git Directory of the git repository to the directory where gerrit deposits the git repos. After restart of gerrit process the new project is in the list of new projects and you can edit description and access rights.

Gerrit内置数据库H2访问权限修改修改project.config文件

[capability]
       accessDatabase = group Administrators

Gerrit工作流程及使用手册介绍gerrit安装http

Gerrit英文学习资料

GIT

修改远程仓库地址：

git remote set-url origin [url]

8.19 - Daily 1018 vim go + Ycm

记录配置vim+go开发环境de折腾过程.

输入".“点号后没有任何提示,不知所措，甚至怀疑就到这里了。

当没有日志，我无从下手解决问题。
我更新了所有的go库，重装了go-code/vim-go，依然没有解决，没有给我任何错误提示。
开始怀疑“人生苦短，我用python”，
不服输，爱折腾。

我重装了vim-go/ gocode之后，我开始怀疑ycm这个玩意。

原来真的是需要升级，而且安装时要注意go

""" /.vim/bundle/YouCompleteMe$ ../install.sh –clang-completer –go-completer

""" 安装时，加上go-completer的参数

最后重启了 ycm服务器
大功告成!!!

参考

Autocomplete stopped working 当我看完这个时，我确定了gocode没有问题。问题可能在ycm;

YouCompleteMe 支持 golang vim 自动补全

用VIM写GOLANG踩坑这里提到为什么go自动提示的那么慢，原因就在go而非ycm, 最后autobuild的操作置为false

8.20 - Daily 11.08 Docker安装

环境

Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty

Install

Get Docker CE for Ubuntu 官方教程

安装前

sudo apt-get install \
  linux-image-extra-$(uname -r) \
  linux-image-extra-virtual

docker

docker中文教程初学者必看,很有用呀

8.21 - Daily 11/03

学习

command-t

mac 上command-t 依赖的ruby 的版本与vim的版本不一样：

vim安装通过brew来安装的，依赖的brew中的ruby
command-t生成Makefile文件的时候依赖系统环境下ruby
系统环境下ruby根据$path路径查找:/usr/local/bin;/usr/bin等等
通过软链接brew下的ruby版本号到/usr/local/bin目录下即可，修改系统环境下ruby版本
重新编译，开始享受vim下查找文件的便利吧

Gerrit

Gerrit 权限控制详细介绍 forge author,和 forge committer这个有用

8.22 - Daily 1104

学习

Gerrit

5、Gerrit权限控制详细的权限控制
Gerrit code review - Tutorial 留着慢慢看吧

Python

python xml pretty print not working 配置xml字符解析


mport StringIO
import lxml.etree as etree

def prettify(xml_text):
    """Pretty prints xml."""
    parser = etree.XMLParser(remove_blank_text=True)
    file_obj = StringIO.StringIO(xml_text)
    tree = etree.parse(file_obj, parser)
    return etree.tostring(tree, pretty_print=True)

Web

页面anchor随着内容变化这个很有必要，当内容太长的时候

8.23 - Daily 1110 DOCKER部署实践总结

学习

总算把docker整完了

Dockerfile

镜像构建文件，

Dockerfile 是一个文本文件，其内包含了一条条的指令(Instruction)，每一条指令构建一层，因此每一条指令的内容，就是描述该层应当如何构建。

docker_practice github 上dockerfile中文介绍讲的很详细

查看别人是怎么写Dockerfile：上docker hub上搜索官方的镜像，对应有github仓库地址，查看构建文件.
比如redis构建文件Dockerfile: Dockerfile

看完就知道redis的数据库文件放在什么位置，redis配置文件在哪里，然后可以数据迁移了

redis

redis 数据迁移笔记

docker-compose

为什么容器服务器启动了，docker-compose ps 查看端口也打开，为什么连接的时候出现connect refused.

docker rm 容器

然后build过程中加上 –no-cache

Docker

为了检查构建的镜像是否成功，需要进入容器检查文件问题是如何进Docker容器了：

几种访问Docker容器的方法

attach 居然卡死，不知道为何/

换种方式,

//查看已经在运行的容器ID
docker ps -a
//通过exec命令对指定的容器执行bash
docker exec -it 31ced27e1684 /bin/bash

深入了解DOCKER 深入，TODO

Docker nginx

需要知道nginx默认配置文件，将自己的配置文件复制粘贴进去

Docker mysql

在docker中运行sql文件

还有个问题，mysql密码怎么设置，免密码失效

RROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2

解决办法是指定
mysql -h localhost -P 3306 –protocol=tcp -u root

导入mysql数据的脚本直接的学习 TODO

Tornado redis

怎么在Tornado中写redis tornado-redis

8.24 - Daily 1116

学习

Docker

Build, Ship, and Run Any App, Anywhere

Use a restart policy 重启规则：

1） no\no-failure\unless-stopped\always TODO on-failure\always 有什么区别

docker 使用 supervisor 配置supervisor在前台运行

[supervisord]
nodaemon=true

第一段 supervsord 配置软件本身，使用 nodaemon 参数来运行

Run multiple services in a container 借助了supervisor

Howto: ssh automatically add new hosts to the list of known hosts 配置ssh 解决首次登录时候需要加入known hosts的问题

Host 10.*
   StrictHostKeyChecking no

Compose file 文档

如何告诉git哪个私钥要使用？最终还是直接使用.ssh/id_rsa默认的私钥了

8.25 - Daily 1127 MYSQL

选择性复制表数据这个可, 数据库批量处理必备

亿级数据库设计

详细见千万级汇总查询优化

基于Mysql数据库亿级数据的设计

Mysql 单标可以存储10亿级的数据，但这个时候性能非常，项目中大量的实验证明，Mysql单表容量在500万左右，性能处于最佳状态。

一张表无法搞定，那么

解决办法

分区

根据查询索引列将单表进行分区，当然这些变化对应用层是无法感知的。

分区类型	说明	使用频率
Range 分区	根据数值范围，根据时间区间或 ID 区间来切分	较多
List 分区	离散值集合	较少
Hash 分区	根据数值取模	较多
KEY 分区	KEY 分区支持 text 和 BLOB， KEY 分区不允许用用户自定义的表达式分区。	较少

Hash 分区，基于给定的分区个数，将数据分配到不同的分区。例如会员表的这种表。 HASH 分区只能针对整数进 HASH，对于非整形的字段只能通过表达式将其转换为整数。

如果基于绑定编号（ID）来做range或者list分区，绑定编号没有实际的业务含义，无法通过它进行查询，因此，我们剩下了HASH分区和Key分区，HASH分区仅支持int类型的列的分区，且是其中的一列。如果基于绑定的时间列进行分区，查询依然很慢。基于搜索列来进行分区，可以保证查询的速度。

分库

垂直分库：根据业务的耦合性，将关联度低的不同表存储在不同的数据库，以达到资源的饱和利用率。这样每个微服务系统使用独立的一个数据库。

分表

分表分为水平分表和垂直分表（也能避免跨页问题）^[7]。

MySQL 底层是通过数据页存储的，一条记录占用空间过大会导致跨页，造成额外的性能开销。另外数据库以行为单位将数据加载到内存中，这样表中字段长度较短且访问频率较高，内存能加载更多的数据，命中率更高，减少了磁盘IO，从而提升了数据库性能。

在业务层增加一张业务和数据存储的表之间的关系表，比如在此方案^[4]中增加了 设备-动态数据关系表（表名t_device_table_map） 来存储设备和动态数据表的关系。

最后，梳理下分表 VS 分区

分区就是水平分表的数据库实现版本，水平分表的优点是可以将单张表的数据切分到多个服务器上，每个服务器具有相应的库和子表。
分区只是一张表中的数据和索引的存储位置发生变化，分表是真实的有多套表的配置文件
分区没法突破数据库层面，而分表可以将子表分配在同一个库中，也可以分配在不同的库中。

NoSql/NewSql #TODO

Index scan vs Bitmap scan vs Sequuentianl scan

PostgreSQL will first scan the index and compile those rows / blocks, which are needed at the end of the scan. Then PostgreSQL will take this list and go to the table to really fetch those rows. The beauty is that this mechanism even works if you are using more than just one index.^[1]

PostgreSQL Bitmap-scan

联合索引

回表，在执行计划中，table access by index rowid 代表是回表动作。

联合索引的理解^[6]: 联合索引结构也是 B+Tree，即按照第一个关键字进行索引，然后在叶子节点上按照第一个关键字、第二个关键字、第三个关键字…进行排序。

最左原则。

如何设计之一：等值查询中，查询条件a返回的条目比较多，查询条件b返回的条目比较多，而同时查询a、b返回的条目比较少，那么适合建立联合索引；

如何设计之二：等值查询、范围查询，等值查询的列建在前、范围查询的列建在后。

其他

导出数据库

mysqldump -u root -p news > news.sql

sqlalchemy session 详细介绍sqlalchemy session 几种状态，以及最佳实践。

Docker

docker 使用 docker 命令使用

Docker: Are you trying to connect to a TLS-enabled daemon without TLS?

sudo docker images

export save 区别

docker export Export a container’s filesystem as a tar archive
Docker images导出和导入实践

BASH

tar 压缩文件夹，exclude排除文件

tar --exclude='./folder1'--exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .

VUE 渲染函数

参考

[1] Hans-Jürgen Schönig: POSTGRESQL INDEXING: INDEX SCAN VS. BITMAP SCAN VS. SEQUENTIAL SCAN (BASICS)

[2] zhanlijun 的博客园: 位图索引:原理（BitMap index）

[3] Markus Winand: Pagination Done the Right Way(PPT)

[4] Chaexsy 掘金: MySql 数据库分表分区实践

[5] 茶谪仙掘金：数据库分区一篇就透了

[6] houbb: 数据库索引-07-联合索引

[7] PHP 架构师布乐： Mysql 的分区/分库/分表总结

8.26 - Daily 1208

这段时间太忙了,忙不是理由，chrome tab 都推挤密密麻麻了

学习

Docker

docker 网络得仔细学习～，使用自定义的网络,–link不生效，有依赖的docker容器启动时提示没办法找到服务

Setting mac address for container docker还可以固定mac地址，mac地址都有了安全很重要

How To Get Docker Container Ip and Mac Address 查询ip地址与mac地址

docker 下部署gerrit 这个非常方便，一键搞定

积累

几十个tab下来都是docker的内容，最近被docker折腾的死去活来，要固定ip，要固定mac地址, 最后还是因为网络无办法访问取消了。

开始改bug

感情

压力山大

8.27 - Daily 1215

学习

Understanding REST

Principles of REST:

Resources expose easily understood directory structure URIs.
Representations transfer JSON or XML to represent data objects and attributes.
Messages use HTTP methods explicitly (for example, GET, POST, PUT, and DELETE).
Stateless interactions store no client context on the server between requests. State dependencies limit and restrict scalability. The client holds session state.

Idempotency 新词，中文翻译过来是：冪等, 这是put与post的最大的区别

idempotent 的意思是如果相同的操作再執行第二遍第三遍，結果還是跟第一遍的結果一樣 (也就是說不管執行幾次，結果都跟只有執行一次一樣)。

Vue

Vue Async Components 异步加载组件

Vue.component('async-webpack-example', function (resolve) {
  // This special require syntax will instruct Webpack to
  // automatically split your built code into bundles which
  // are loaded over Ajax requests.
  require(['./my-async-component'], resolve)
})

但是，如下webpack 2 + ES2015代码却不行了

Vue.component(
  'async-webpack-example',
  // The `import` function returns a `Promise`.
  () => import('./my-async-component')
)

TODO

vim

YCM vue不支持打开vue文件导致ycm server挂了，没找到合适的办法解决
目前手动启动ycm server:YcmRestartServer

8.28 - Daily 1229

Django zip files (create dynamic in-memory archives with Python’s zipfile) 动态打包文件并输出到浏览器端


from StringIO import StringIO
from zipfile import ZipFile
from django.http import HttpResponse

def download(request, company_id):

    in_memory = StringIO()
    zip = ZipFile(in_memory, "a")

    zip.writestr("file1.txt", "some text contents")
    zip.writestr("file2.csv", "csv,data,here")

    # fix for Linux zip files read in Windows
    for file in zip.filelist:
        file.create_system = 0

    zip.close()

    response = HttpResponse(mimetype="application/zip")
    response["Content-Disposition"] = "attachment; filename=two_files.zip"

    in_memory.seek(0)
    response.write(in_memory.read())

    return response

Using the Forwarded header

proxy_set_header Forwarded $proxy_add_forwarded;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

获取真实的地址： How do I get the client IP of a Tornado request?

x_real_ip = self.request.headers.get("X-Real-IP")
remote_ip = x_real_ip or self.request.remote_ip

8.29 - Daily 2021/11/01 Postive Discipline

正面管教的原则

正面管教构成的要素

相互尊重
理解行为背后的信念
理解孩子的发展和适龄行为。
有效的沟通
能教给孩子技能的管教
专注于解决方案，而非惩罚
鼓励
孩子在感觉更好时才会做的更好。

应当避免的管教方式

如果你在对孩子大声喊叫或说教，请停下来。如果你在孩子的屁股或打手心，请停下来。如果你在试图通过威胁、警告、贿赂或说教让孩子顺从，请停下来。所有这些方式都是不尊重的，并且会导致孩子的怀疑、羞愧、内疚，不仅在当时，而且包括未来。

孩子真正需要的是什么？

归属感（情感联结）
个人的力量和自主（有能力）
社会和人生技能（有贡献）
和善而坚定的指导，教给孩子的技能的管教（以尊严和尊重的方式）

8.30 - Daily 2022/01/13

PPT 养成日记

写PPT还是非常耗时的, 一张 PPT 已经修改了100多个版本了。

那如何写好每一张 PPT 呢？就是如何制作一张漂亮的网页一样。

配色

内容框架

每一张 PPT 的构图

这里有很多模版可以参考，比如 slide team 上虽然是收费的，但是上面的画图思路很值得学习。

参考

[1] PPT 作图思路 https://www.slideteam.net/

8.31 - Daily 5/31

此时

今天是2017年前半年最后一天了，时间真快，过年前大家还在说2017年要实现什么目标，可是我真的没离目标靠近了多少，真伤感。偶尔我真的大叫一声释放下此时的压力。

此事

事业：1、办公家具事情在开年出现几次问题，其中最大问题是工厂发出的货是次品，让我产生想退出办公家具贸易，做自己的产品，从而把控生产质量与包装，减少售后；2、做外贸树脂产品、现在一个月出几单实在让人头疼，接下来走多平台多流量；3、WISH一个店铺因为发货太慢每况愈下，其他店铺正在审核或在上架产品。4、偶尔又想去上班、今天试着让微信朋友投了一份简历出去。5、每天工作到6-7点就感觉要去跑步锻炼了，不然头晕。
感情：事业压力大，偶尔不想去碰感情的事情，回头一想年纪也算大了，得抓紧时间了

以后

选产品上架出单，不断学习，或工作或继续外贸内贸，感觉后者几率大一些，加油呀！

8.32 - Daily 6/1

流水账

今天上午完成30个产品上传，舒爽！明天坚持30个…
下午胡思乱想网络传真服务的事情：是否进入做：

一个月销售额4万，成本一万，月赚3万。
传真是个没落行业，市场需要就那么大，再进去做顶多做到一半的市场，2万！才两万！除去成本没剩多少了
以最低的成本找人一起做这个服务。

打包发货开关贴，包装跟产品一样重量，要减重减重减重
晚上加班制作视频，还是不太熟练Ulead Video Studio 11，摸索摸索。明天正确把视频做出来，接下来拍照开关贴回来俯卧撑，涡轮推，倒立，洗澡睡觉
嘴角上火，吃粽子吃多了，饮食乱了。

六一快乐，一去不复返的时间 | 看了别人上川岛的露营照片，景真是美，等着，忙完就去

多年以后回想此时此情此事

8.33 - Daily 6/3

Ulead Video Studio 视频中音频处理

今天抽空对视频的声音进行优化：整个视频由图片以及拍摄的视频组成，当播放从图片到录制视频之间，声音是突然地从无到有过程，感觉很不协调，正常应该是从无开始、声音慢慢变大、到最大持续、最后随着结束慢慢变小、最后没有声音的过程。而Ulead Video Studio中fade in & fade out 很好的处理这个需求，而且可以调整声音变化的快慢
总算认真学习了《会声会影 11 教学影片大纲：第三单元影片剪接与素材调整》这个课程真的很实用，台湾课程，非常赞，再学习其他单元（为了打开MDF、MDS文件，也是找遍了大半个百度，也算值了，教程这么好）
视频制作告一段落！

计划

亚马逊平台：开2-3款树脂产品
WISH平台：继续每天30款产品上架（MD今天偷懒了，明天继续)
淘宝平台：家具产品

8.34 - Daily 9/11

学习

NERDTree

隐藏指定后缀名的文件

let NERDTreeIgnore = ['\.pyc$']

Ctrlsf

In CtrlSF Window:

O - Like Enter but always leave CtrlSF window opening.
t - Like Enter but open file in a new tab.

nnoremap <C-F>t :CtrlSFToggle<CR>
inoremap <C-F>t <Esc>:CtrlSFToggle<CR>

Command-t

检索出文件，需要在新的split window打开文件

<C-CR>      open the selected file in a new split window
<C-s>       open the selected file in a new split window
<C-v>       open the selected file in a new vertical split window
<C-t>       open the selected file in a new tab

vim

split

Ctrl-w = resize 所有的窗口一样大小
Ctrl-w | 宽度最大化
Ctrl-w _ 高度最大化

search

:noh

turn off highlighting util the next search

ycm

选择快捷键

let g:ycm_key_list_select_completion = ['<TAB>', '<Down>']

SQLachemy

atmcraft model 目录下的meta是做什么用的

Tornado

selene 案例学习，集成数据库monogoDB异步查询 代码阅读
ohmyrepo 简单的案例，集成了Cache。 代码阅读

8.35 - Daily 9/12

学习

Bash History

修改命令记录数量

export HISTFILESIZE=10000
export HISTSIZE=1000

MAC Terminal TAB

tab切换快捷键 Command + shit + {或Command + Shit + ← 向左切换
Command + shit + }或Command + Shit + → 向左切换

Gerrit

gerrit query Query the change database gerrit query
gerrit query : Obtain the latest refspec on a Gerrit Change 详细讲解gerrit query
gerrit stream-events 提交事件
gerrit stream-events Provides a portal into the major events occurring on the server, outputting activity data in real-time to the client. Events are filtered by the caller’s access permissions, ensuring the caller only receives events for changes they can view on the web, or in the project repository.

GIT

GIT_SSH_COMMAND

用于git ssh链接访问仓库时，默认设置
GIT_SSH_COMMAND $GIT_SSH_COMMAND takes precedence over $GIT_SSH, and is interpreted by the shell, which allows additional arguments to be included.

GIT_SSH_COMMAND='ssh -i %s' git fetch

FETCH_HEAD

FETCH_HEAD 指的是: 某个branch在服务器上的最新状态'. 每一个执行过fetch操作的项目’都会存在一个FETCH_HEAD列表, 这个列表保存在 .git/FETCH_HEAD 文件中, 其中每一行对应于远程服务器的一个分支. 当前分支指向的FETCH_HEAD, 就是这个文件第一行对应的那个分支.

Vim

:set wrap "设置自动换行
:set nowrap "设置不自动换行

Python

rfind returns the last index whre the substring is found, or -1 if no such index exists

str.rfind(str, beg=0 end=len(string))

获取当前时间的时间戳

int(time.time())

8.36 - Daily 9/13

学习

Python

super

理解 Python super 不要一说到 super 就想到父类！super 指的是 MRO 中的下一个类！

class Root(object):
    def __init__(self):
        print("this is Root")

class B(Root):
    def __init__(self):
        print("enter B")
        # print(self)  # this will print <__main__.D object at 0x...>
        super(B, self).__init__()
        print("leave B")

class C(Root):
    def __init__(self):
        print("enter C")
        super(C, self).__init__()
        print("leave C")

class D(B, C):
    pass

d = D()
print(d.__class__.__mro__)

mixin

Python mixin模式 Python的Mixin模式可以通过多继承的方式来实现

Vim

let g:UltiSnipsExpandTrigger="<c-j>"

ultisnips 自动完成跟YCM的<tab>快捷键冲突
UltiSnips 让 Vim 飞起来 vim snippets详细设置过程

Andoid

Compile gradle project with another project as a dependency 编译两个独立的项目，需要制定依赖的路径
Dependency Management Gradle依赖

8.37 - Daily 9/14

学习

Vue

vue-history-api-fallback 解决url中包含句的问题

history({
  router,
  disableDotRule: true
});

webpack

利用historyApiFallback选项，可以重写url

historyApiFallback: {
    rewrites: [
        // shows views/landing.html as the landing page
        { from: /^\/$/, to: '/views/landing.html' },
        // shows views/subpage.html for all routes starting with /subpage
        { from: /^\/subpage/, to: '/views/subpage.html' },
        // shows views/404.html on all other pages
        { from: /./, to: '/views/404.html' },
    ],
},

Javascript

import 用法 used to import functions, objects, or primitives which are defined in and exported by an external module, script, or the like.

import defaultMember from "module-name";
import * as name from "module-name";
import { member } from "module-name";
import { member as alias } from "module-name";
import { member1 , member2 } from "module-name";
import { member1 , member2 as alias2 , [...] } from "module-name";
import defaultMember, { member [ , [...] ] } from "module-name";
import defaultMember, * as name from "module-name";
import "module-name";

Tornado

Python与Tornado 一个系列抽时间通读 #TODO

Python

Differences between isinstance() and type() in python isinstance与type

Correct way to write line to file in Python print也可以写入文件里

8.38 - Daily 9/18

学习

redis

Redis持久化-RDB与AOF 同一个redis实例可以配置两种持久化方案
Redis 查询所有的key 查询所有的key
Redis持久化 redis持久化两种方式的对比

python 重命令

os.rename How to change folder names in python?

Vim

mac vim 配置文件 mac vim 记住上次打开的位置

set viminfo='10,\"100,:20,%,n~/.viminfo
au BufReadPost * if line("'\"") > 0|if line("'\"") <= line("$")|exe("norm '\"")|else|exe "norm $"|endif|endif

vim 语法高亮对于大文件，语法高亮

syn sync fromstart

vim YCM插件自动补全比较慢的解决办法第三个字母开始自动完成提示

vim YCM跳转到定义跳转到申明使用往前跳和往后跳的快捷键为Ctrl+O以及Ctrl+I。

nnoremap <leader>gl :YcmCompleter GoToDeclaration<CR>
nnoremap <leader>gf :YcmCompleter GoToDefinition<CR>
nnoremap <leader>gg :YcmCompleter GoToDefinitionElseDeclaration<CR>

vim bookmark 字母书签的功能

[vim recording] What is vim recording and how can it be disabled? 偶尔按错键，导致左下角出现’recording @q’等字样

javascript

IOS webview 与js通信方案跟java代码有点区别，不能统一吗

Server can’t be accessed via IP webpack-server如何配置通过IP访问

python 序列化与反序列化

pickle — Python object serialization¶ 用于redis cache 缓存

confluence

这个跟jira如何协作的？

8.39 - Daily 9/21

学习

python unittest tornado

test_handlers.py 案例代码 selene项目测试案例

request_handler_test.py 案例代码设置项目根目录、父类构建

tornado.testing —单元测试支持异步代码支持异步测试

How to use a test tornado server handler that authenticates a user via a secure cookie mock cookie

命令行下执行单个unittest 更多的参数

python -m unittest test_module1 test_module2
python -m unittest test_module.TestClass
python -m unittest test_module.TestClass.test_method

setup 使用，初始化数据 TODO

tornado sample 有接口测试的项目

python unittest文档

python unittest执行顺序问题字母的顺序执行

python mock handler

class MyUT(tornado.testing.AsyncHTTPTestCase):
  def get_app(self):
    settings = {
      "template_path": '../../../templates',
      "cookie_secret": 'secret',
      "login_url": '/admin/login',
      "debug": True
    }

    return Application([
      (r'/admin/create/super', handlers.CreateSuperUserHandler)
    ], **settings)


  def testGet(self):
    with mock.patch.object(handlers.CreateSuperUserHandler, 'get_current_user') as m:
      m.return_value = {}
      response = self.fetch('/admin/create/super')

    print(response.body)
    self.assertGreater(response.body.index('create'), 0)

mock handler更详细的例子 patch和patch.object 使用

Python

操作dict时避免出现KeyError的几种方法使用get方法更好
How to get name of exception that was caught in Python? 获取异常的的类名

try:
    foo = bar
except Exception as exception:
    assert type(exception).__name__ == 'NameError'
    assert exception.__class__.__name__ == 'NameError'

删除dict里面的元素 use del

>>> from collections import OrderedDict
>>> dct = OrderedDict()
>>> dct['a'] = 1
>>> dct['b'] = 2
>>> dct['c'] = 3
>>> dct
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> del dct['b']
>>> dct
OrderedDict([('a', 1), ('c', 3)])
>>>

python面向对象入门级介绍 TODO

python set 集合操做比如删除某个元素 set.remove(key)

subprocess returncode 什么时候为0

import subprocess as sp
child = sp.Popen(openRTSP + opts.split(), stdout=sp.PIPE)
streamdata = child.communicate()[0]
rc = child.returncode

Tornado

Tornado异步与延迟任务 Tornado异步通俗易懂的教程
协程官方文档难懂
tornado.gen源码解析 TODO

GIT

*
!.gitignore

git中如何提交空目录空文件

git 提交撤销

git reset HEAD~2

8.40 - Daily 9/22

学习

项目

项目周报的编写参考confluence里面的优秀的周报日志，TODO
模版如下：

### 本周工作
---
1. 学习vue
2. 完成登录页面开发
...

### 下周计划
---
1. 继续学习vue
2. 继续完成登录页面开发

### issues
---
- 我觉得我们团队棒棒哒，就是女生少了一点
- webpack的打包时间越来越长，我们可以考虑优化它

VUE

合作的同事去了其他部门，我开始接口VUE开发工作，然后问题来了 Watch 对比更改前的值oldValue

a: function (val, oldVal) {
      console.log('new: %s, old: %s', val, oldVal)
    },

Vim

vim-vue 下vue文件居然高亮了前部分

autocmd FileType vue syntax sync fromstart

加上上面的配置就可以了

支持NERDCommenter的解决办法。

let g:ft = ''
function! NERDCommenter_before()
  if &ft == 'vue'
    let g:ft = 'vue'
    let stack = synstack(line('.'), col('.'))
    if len(stack) > 0
      let syn = synIDattr((stack)[0], 'name')
      if len(syn) > 0
        exe 'setf ' . substitute(tolower(syn), '^vue_', '', '')
      endif
    endif
  endif
endfunction
function! NERDCommenter_after()
  if g:ft == 'vue'
    setf vue
    let g:ft = ''
  endif
endfunction

Codepen.io

在线编辑代码，可见即可得，学习的好工具，支持zencoding
login in by github

CSS

布局 40个教程、技巧、例子和最佳实践

css Flex flex布局，项目中用到比较多

javascript

Monad 这是个什么鬼，第一次听说还是无意中看到 TODO

processon

流程图的好工具在线的
已经开始使用了，简简单单画了三个图，太强大了。

8.41 - Daily 9/25 #vpn

学习

Supervisor

Supervisor 是一个用 Python 写的进程管理工具，可以很方便的用来启动、重启、关闭进程（不仅仅是 Python 进程）。除了对单个进程的控制，还可以同时启动、关闭多个进程，比如很不幸的服务器出问题导致所有应用程序都被杀死，此时可以用 supervisor 同时启动所有应用程序而不是一个一个地敲命令启动。
Supervisor 中文文档

Screen

在标题和状态栏中显示Screen的窗口名称

caption always "%{= kw}%-w%{= kG}%{+b}[%n %t]%{-b}%{= kw}%+w %=%d %M %0c %{g}%H%{-}"

Vpn

今天vpn可以看720P视频奇怪，接下来要做的事情是测试vpn速度，保证vpn24小时速度稳定

8.42 - Daily 9/27 Git 相关知识点

学习

Bash history

编辑文件.bash_profile

HISTFILESIZE=100000

重启下终端，看看~/.bash_history 是不是就可以存储更多的历史命令了。

Git tag

补打标签

git tag -a v1.2 9fceb02

分享标签

git push origin [tagname]

Git stash

多个分支共享 stash list, 为什么git stash apply时不区分分支。

今天碰到的一问题，git stash完美解决问题：我在 dev 分支上更改了一个小 bug，我需要把这个升级到线上，但是 dev 分支上正在进行项目 A 的开发很久了，没办法直接把 dev 上线，怎么办？只有先git stash储藏起来，然后切换分支，通过git stash apply 应用到切换后的分支即可，然后把提交，上线新分支。切换回 dev 继续开发项目 A

Git checkout revert reset

代码回滚：Reset、Checkout、Revert 的选择好详细的介绍

代码合并：Merge、Rebase 的选择 git rebase 黄金法则

有没有别人正在这个分支上工作？

Git rebase

Git-rebase 小筆記介绍各种 rebase 详细技巧

修正 commit 過的版本歷史紀錄 Part 5 实战 Rebase 能做的事

Git tree

show a Git tree in terminal

git log --graph --oneline --all

配置 git tree

git config --global alias.tree "log --graph --decorate --pretty=oneline --abbrev-commit"

Git push -f

Git 回滚远程版本真的想不到这个都可以，太强大了

git log
git reset --soft ${commit-id}
git stash
git push -f

Git log

一直以来对于 git log –graph 有一种似懂非懂的感觉。来自 Git Community Book 中文版 - 查看历史 - Git 日志提到日志排序的几种

逆时间顺序 (reverse chronological)：默认情况下为这种方式
拓扑顺序 (–topo-order): 子提交在他们的福提交前显示，这种方式会看到”开发线“(development lines) 都会集合在一起
提交日期顺序

Talk is cheap，Show me the code or money~

8.43 - Daily 9/5

一直要做的事情：知识的系统化梳理
LP生气了，换了微信头像，发了朋友圈，哽咽几下，而我说了一堆话，不知道有没有用

vue+webpack重构了整个项目，明天上线新系统，工作上事情很多,做不完的工作,学不完的知识（系统化多重要）
- 工作上的发展方向是怎样的，这个问题一直不想，TMCD！
贸易上的事情进展太慢了
- Amazon上产品
- 开模新产品
- 产品包装：产品尺寸定做包装
- 发货还是个问题，FBA

说了一句话：“我对自己也不满意” ，英文翻译过去应该是’fuck yourself, or What'.
睡觉，长叹一口气！

8.44 - Daily 9/6

感情

关进了小黑(la)屋(hei)

过了2个小时，出来了！上次也是关了几个小时。
生气了一会就自我恢复了。

工作

新系统顺利上线了

学习

Curry 编程 Currying is the process of turning a function that expects multiple parameters into one that, when supplied fewer parameters, returns a new function that awaits the remaining ones.

Ramda.js A practical functional library for JavaScript programmers.

8.45 - Daily 9/8

学习

#### Python时间处理 [Python dateutil](https://dateutil.readthedocs.io/en/stable/index.html) 非常强大的时间处理，对这种数据'Fri, 21 Jul 2017 14:42:50 +0800'轻易解析 [strftime](http://strftime.org/) 最详细的时间格式说明 [3.15 字符串转换为日期](http://python3-cookbook.readthedocs.io/zh_CN/latest/c03/p15_convert_strings_into_datetimes.html)官网文档datetime.strptime ```python date.today().strftime(u'%Y年%m月%d日'.encode('utf-8')).decode('utf-8') ```
#### EMAIL收发邮件 [官网email](https://docs.python.org/2/library/email-examples.html)发送邮件的详细例子 [POP3收取邮件](https://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001408244819215430d726128bf4fa78afe2890bec57736000) 详细教程 [Reply to email using python 3.4](https://stackoverflow.com/questions/31433633/reply-to-email-using-python-3-4)回复邮件的代码 [邮件解析](http://blog.donews.com/limodou/archive/2004/12/30/220588.aspx) 编码问题 [邮件详细解析](http://www.cnblogs.com/zixuan-zhang/p/3402821.html) 这个更详细

#### PYTHON GIT [pygit2](http://www.pygit2.org/merge.html) python api for git
### 生活周日去香港徒步，16公里

8.46 - 又是一年最长假-国庆 Daily 9/29

生活

国庆长假对于工作的人真是长假! 回！

学习

Vim

Switching case of characters 还是官网文档比较给力

Javascript

Is object empty? JS中如何判断一个空的对象

Array最后一个元素

var args=new Array(['www'],['phpernote'],['com']);
alert(args.pop());//com

正则表达式

不匹配的问题

^((?!hede).)*$

Webpack

修改dev server的根目录

devServer: {
  contentBase: path.join(__dirname, "dist"),
  compress: true,
  port: 9000
}

当js文件名webpack打包生成的时候，不能写死在html中，如何让js文件自动加入到html文件中

HtmlWebpackPlugin 插件解决了这个问题

var HtmlWebpackPlugin = require('html-webpack-plugin');
var webpackConfig = {
  entry: 'index.js',
  output: {
    path: __dirname + '/dist',
    filename: 'index_bundle.js'
  },
  plugins: [new HtmlWebpackPlugin()]
};

github 项目地址

VUE

Vue.js 带来的良好的开发模式：模板，数据绑定，组件化，自动化，资源统一化

vue-mobile - 一个基于VUE的UI框架

9 - Others

关于生活、学习、工作

9.1 - 年底的沟通记录

背景

没有反思的人生是不值得一过的，年底了， leader问你的哪些的问题还记得吗？是否有更好的方式回答了。

问题列表

对其他同事的规划是怎样的
这个问题并没有思考太多，临场发挥了，一一说明了大家目前的情况和优缺点。

对上一级的评价
从业务、沟通、技术能力三方评价吧，自我感觉还挺好的

介绍了年终奖后，对年终奖的满意度
比较诚实的回答，也不清楚是否有更好的答案，跟其他同事聊天的时候，其他同事会回答“满意”

对2020年在业务上的规划
并没有做任何的准备，根据最近的思考回答了这个问题。
如果提前有准备，我相信回答的更好的，不仅仅是在这种情况下需要介绍规划，在其他情况下也是有必要能够清晰说出我们的是什么，我们走向何方？

（自己问题）回顾2019年
去年年底年会时候的想法在年中的时候，突然发生了变化。对于团队，并没有带领他们做出非常出色的工作和成果，或者做出做他人有影响的事情

·End·

9.2 - What We Concern

背景

我们的问题

效率

效率，不只是开发、测试、需求，指的是迭代上线的效率。

业务的价值和清晰

做有价值的业务, 并有清晰的规划

缺乏严格的需求评审

我们的改变

提高效率

更改目前发版的流程，不需要建版本，在发版完即可通知相关方。
更改目前的测试方式，推动接口测试+单元测试，不依赖与APP测试。
着力中台, 沉淀公共业务领域，提高小前台的的迭代速度
引入devops, 释放developer的参与的运维工作

清晰的业务和价值

需求流程优化
加强需求评审

应用能力平台化

生态圈。开放平台是生态圈的关键，把企业的服务以 Open API 的方式对外暴露，可以让更多的第三方软件提供商的软件接入到我们的平台中，从而造就一个广阔的生态圈。

[行业对标] 银行生态云建设思路及架构参考

内容来自：银行生态云建设思路及架构参考

技术路线

@startuml

actor "开发人员" as dev 
actor "产品&运营人员" as run 


skinparam RectangleFontName Papyrus
skinparam RectangleFontSize 24
rectangle " 云服务开发规范 " as spec {

}

rectangle "基础云平台"  as plat {
    rectangle "基础服务平台" as basic
    rectangle "安全接口" as basic1
    rectangle "审计接口" as basic2
    rectangle "计量接口" as basic3
    rectangle "计费接口" as basic4

    basic -> basic1 
    basic -> basic2
    basic -> basic3
    basic -> basic4
    basic1 -[hidden]-> basic2
    basic2 -[hidden]-> basic3
    basic3 -[hidden]-> basic4
}


rectangle "标准云产品" as product {

    rectangle "安全接入" as a 
    rectangle "计量接入" as b
    rectangle "审计接入" as c

    a -[hidden]-> b 
    b -[hidden]-> c
}
'spec -[hidden]-> plat
plat --> spec: 云平台提供标准接口和开发规范

product -> plat: 通过接口接入，在运营平台上自动接入

spec --> dev: 开发规范指导云服务产品开发

dev -> product: 开发人员遵循各项接口标准


run --> product
@enduml
@enduml

“开放云平台+标准产品”的方式成就了典型生态云的持续运营能力。开放的云平台提供云基础服务，并以标准接口的方式把这些基本服务暴露给云产品开发人员。

@startmindmap

* 生态云技术路线
** 标准开放的云平台
*** 核心能力
**** 标准开放框架
*****_: 标准的开放框架是生态云能健康持续发展的基础，
定义了云服务构建的技术标准，允许快速开发标准产品
;
**** 产品服务能力？
*****_ 产品服务能力是生态云的价值体现。
**** 安全服务能力
*****_ 为云平台和云服务的租户提供**体系化**的安全管理能力
**** 持续运营能力
*****_: 提供客户运营、云服务产品运营及平台运营能力，
实现标准云产品的生命周期管理，实现客户自主服务。
;
**** 集中管控能力
*****_ 实现云资源和业务应用的统一管理、调度和监控、底层资源的扩缩容管理

** 标准产品
@endmindmap

[行业对标] 华润集团云计算服务平台建设思路讨论

Click PDF 来自甲骨文首席架构师刘翔

对云计算的NIST解释

[NIST] 云计算是一种新的模式，基于此消费者能能够方便、按需地从网络访问到共享的可配置计算资源池（如，网络、服务器、存储、应用程序和服务），且只需最小的管理或与服务提供方交互即可快速供应和释放这些资源。

5个重要特征（STEAM)

S按需自助服务
T多租户的资源池
E快速伸缩
A广泛的网络访问
M按使用量收费的服务

4种部署类型

公有云
私有云
社区云
混合云

3种服务模式

SaaS
PaaS
IaaS

[行业对标] 谢冲 - 华为云 IoT 生态开放架构与实践 v1.0

生态开放除了涉及不仅要考虑软件架构还需要考虑开放三要素

开放层次
开放形式
开放平台

物联网平台化 - IoT

这块知识是2020年需要成长的地方，硬件为主，赋能硬件是后续的趋势。

架构是怎样的？

案例/论文参考：

Lambda architecture a data-processing architecture designed to handle massive quantities of data by take advantage of both batch and stream-processing methods.

A Distributed Stream Processing based Architecture for IoT Smart Grids Monitoring .pdf

Cyclic Architecture
- Messaging Layer
- Processing Layer
- Volatile Layer

IoT: A web of interconnected layers.pdf 点击链接查看多层架构。

Device Layer
Data ingestion and transformation layer
- Data from the device layer is transformed through different protocols to a standard format.
Data processing layer
Application layer

Applying the Kappa architecture in the telco industry

Stream IoT data to an autonomous database using serverless functions

Oracle IoT Streaming Arch

Oracle IOT 流架构

参考

Architecture Patterns for IoT
Tesing IOT Applications.pdf
Infoq: The Perfect Pair: Digital Twins and Predictive Maintenance

[行业对标] Google Cloud IOT

参考下面服务组件和数据流图: Cloud IoT Core

名词解释

名词	解释
device registry	a container of devices with shared properties
device	a “Thing” in “Internet of Things”; a processing unit that is capable of connecting to the internet (directly or indirectly) and exchanging data with the cloud.

实践

如下内容来自Google IoT Core 指南

Install Google Cloud SDK
Create devices registries
- Create a device registry
- IAM role for Pub/Sub publishing
Creating device key pairs
- First create a pulbic/private key pair
- When connecting to Cloud IoT Core, each device creates a JSON Web Token(JWT) signed with its private key, which Cloud IoT Core authenticate using the device’s pulbic key
  - Cloud IoT Core can verify device public key certifcates against registry-level CA certificates? 用注册的CA证书验证设备公钥证书
  - 作用：a verified cerficate attests that a public/private key pair belongs to a legitimate device. 当设备生产商创建公私钥后，私钥存储在设备中，而公钥被CA签名。
  - 设备注册有CA证书时，那么只接受CA签名过公钥设备. 当平台需要兼容多种设备时，可以让设备添加到响应的设备注册里，不至于混乱添加，导致设备收到异常指令或者上报错误的信息。
Creating or editing a device
- 创建设备时身份验证可以选择公钥的格式。
  - 公钥 (RS256 或 ES256)
  - 公钥证书（被CA签名过的）

@startuml
!include <awslib/AWSCommon>
!include <awslib/General/Client>
!include <awslib/Mobile/APIGateway>

Client(device, drone, "")
APIGateway(api, "IoT Broker", "")

device -> api: 使用JWT认证方式\n以mqtt协议接入
@enduml

[行业对标] 阿里云 IOT

实践

TODO

[行业对标] EMQ X

关键技术
- 分布式
- 容器化
- 桥接
核心指标
- 多协议
- 并发量：单服务器200万并发, 一个集群1000万并发(7个节点）
- 吞吐量：单集成百万并发

实践

TODO

[行业对标] Azure IoT

IoT Hub REST, offer programmatic access to the device, messages ,and job service , as well as the resource provider, in IoT Hub.

Communication with your IoT hub using the MQTT protocol 如何定义与设备之间的接口

IoT Hub endpoints

Resource Provider
Device identity management
Device twin management
Jobs management
Device endpoint
- send device-to-cloud messages
- receive cloud-to-device messages
- initiate file uploads
- retrieve and update device twin properties
- receive direct method request
Service endpoint
- receive device-to-cloud messages
- send cloud-to-device messages and receive delivery acknowledgements
- receive file notification
- direct method invocation
- receive operation monitoring events

Azure IoT reference architecture show a recommended architecture for IoT applications on Azure using PaaS(platform-as-a-service) components

two way to process telemetry data:

hot path
- the hot path analyzes data in near-real-time, as it arrives, In the hot path, telemetry must be processed with very low latency. The hot path is typically implemented using a stream processing engine(Azure Stream Analytics or Apache Spark). The output may trigger an alert, or be written to a structured format that can be queried using analytical tools.
code path
- The cold path performs batch processing at longer intervals(hourly or daily).

Data Storage

For cold path storage, Azure Blob Storage is the most cost-effective option

For warm path storage, consider using Azure Cosmos DB.

解决方案

Publish and subscribe with Azure IoT Edge Azure IoT 如何解决 Topic pub/sub 权限的问题。

Real Time Analytics on Big Data Architecure Get insights form live streaming data with ease.

Advanced Analytics Architecture Transform your data into actionable insights using the best-in-class machine learning tools. This architecture allows you to combine any data at any scale, and to build and deploy custom machine learning models at scale.

Ingesting, processing and visualizing real-time vehichle data

国外物联网平台初探（二） ——微软Azure IoT 平台定位: 连接设备、其它 M2M 资产和人员，以便在业务和操作中更好地利用数据

点击查看图片

[行业对标] 华为云 IoT 生态开放架构与实践 v1.0

IoT 发展面临的挑战与破局思路: 上能助力创新，中间业务使能，下能连接万物
华为云 IoT 生态开放架构与实践: 开放架构和开放三要素
落地方案分享

解决方案

我们关心什么需求

性能 (Performance)
- MST, Maximum sustainable throughout
- Lantacy
可伸缩性 (Scalability）
- Maximun number of supported concurrent connections
- The time to start a new broker
韧性/可用性 (Resilience)
- The message loss count in case of a broker instance crashing
安全 (Security)
- only a side-aspect of security and measures the overhead of enabled TLS encryption on the maximum sustainable throughput (as percentage).
可扩展性 (Extensibility)
- offers plug-in mechanisms
易用性 (Usability)

选择

EMQ X

参考

MQTT ESSENTIALS by HIVEMQ团队整理 https://www.hivemq.com/mqtt-essentials/
- MQTT Basic
- MQTT Features
- MQTT Specials
初识MQTT https://developer.ibm.com/zh/articles/iot-mqtt-why-good-for-iot/
- 为什么是MQTT而不是其他协议
- MQTT协议是怎样的
EMQ X https://github.com/emqx/emqx
NATS
- Does NATS support MQTT? https://github.com/nats-io/nats-server/issues/812
  - 分支开发中，预计在Q4支持
- 支持持久化存储 File Store / SQL Store Persistence
Comparison of MQTT Brokers for Distributed IoT Edge Computing [pdf]

·End·

9.3 - 4.2最美东湾露营-那些年我们在一起

第一次露营 - HK东湾

9.4 - INTERVIEW CONCLUSION

2012-12-08 福田财富大厦问题

10万数字，选取最大的100个数。算法度

这个题目当时没细想，估计面试官想问排序问题，然后说了下快速排序。
其实完全没必要排序。遍历一次就可找出来最大的100数字。

类似问题： Write a program to find 100 largest numbers out of an array of 1 billion numbers

xxxx年xx月xx日 xx:xx:xx 表示当前日期

代码终于重写到自己感觉满意了。

给定个返回json的url，有哪些方法获取数据然后显示在其他域的页面上

如果提供jsonp的调用,页面中调用jsonp方式
后端代理的方式。
Cross-domain Request(cors)
还有其他的方式吗？ Tell me!!!

算法时间复杂度与空间复杂度的计算

时间频度

一个算法所花费的时间与代码语句执行的次数成正比。我们把一个算法中的语句执行次数称为时间频度，记作 T(n)

渐进时间复杂度

在时间频度 T(n) 中，n 表示着问题的规模，当 n 不断变化时， T(n) 也会不断地随之变化。那么，如果我们想知道 T(n) 随着 n 变化时会呈现什么样的规律，那么就需要引入时间复杂度的概念。

如果存在某个函数 f(n)，使得当 n 趋于无穷大时，T(n)/f(n)的极限值是不为零的常数，那么 f(n) 是 T(n) 同数量级的函数，记作 T(n) = O(f(n))，称 O(f(n)) 为算法的渐进时间复杂度，简称为时间的复杂度。

常见的时间复杂度有： O(1) 常数型； O(log_n) 对数型；O(n) 线性型；O(nlog_n) 线性对数型；O(n²) 平方型；O(n³) 立方型；O(n^k) k次方型；O(2ⁿ) 指数型。

更多时间复杂度实例^[1]

参考

[1] 程序新视界: LeetCode0：学习算法必备知识：时间复杂度与空间复杂度的计算

9.5 - 港岛径第八段-土地灣-龙脊-大浪灣

港岛径第八段：土地灣-打爛埕頂山-龍脊-馬塘坳-大浪灣

港島徑第八段是港島徑最精彩開闊的路段，國際雜誌也曾報導此路線。龍脊上視野遼闊，大潭港、石澳及大浪灣盡入眼簾。除了初段需要微升外，其餘都是較平坦或下降的路段。

下地铁转公交,这个指示牌非常显眼

去的人还真不少，走了两趟车坐上去，10几分钟一趟

土地湾公交站台下车，入口一张地图，往上走去龙脊

没多远到半山腰上，看到对面岛，光线太暗

龙脊，拍照吃东西

山顶上大石头

一直沿着山脊走

进入半山腰，路的两边都是树木，天热走起来很凉爽

分岔点，也是休息处，往大浪湾方向

港岛径100，到终点：大浪湾

感受下沙滩

冲浪，有点冷

吃东西休息拍照，坐巴士回铜锣湾地铁

OVER

9.6 - 黄牛山 .麦理浩径第四段五段狮子山笔架山

Blog

1 - Archtecture & Design

1.1 - 模型推理架构

概念解释

对标方案 - Run computer vision inference on large videos

Real-time Inference

TensorRT through NVIDA Triton

Asynchronous inference

Prerequisites

Use Your Own Inference Code with Hosting Services

Adapting Your Own Inference Container

Create

Prebuilt SageMaker Docker Images for Deep Learning

Batch Transform

Serverless Inference

对标方案 - Machine Learning Platform for AI - EAS

参考

1.2 - 云原生模式 CLOUD NATIVE PATTERNS

Refs

1.3 - 消息队列 资料汇总

基础概念

案例分析

知乎千万级高性能长连接网关揭秘

基于 Flink 的资讯场景实时数仓

参考资料

1.4 - Prometheus Practise

背景

名词解释

Operator

Prometheus

设备插件 device-plugin

监控指标

开始

使用 dcgmproftester

有感

FAQ

参考

1.5 - Notes for Patterns of Enterprise Application

数据源架构模式

表数据入口 (Table Data GateWay)

行数据入口 (Row Data GateWay)

活动记录 (Active Record)

数据映射器(Data Mapper)

处理查找方法

把数据映射到领域对象的域

基于元数据的映射

分布模式

远程模式（Remote Facade)

数据传输对象（Data Transfer Object）

使用时机

FAQ

离线并发模式

乐观离线锁

悲观离线锁

运用机制

分布式锁

会话状态模式

参考

1.6 - GPU 那些事儿

概念解释

MapReduce

Spark 和 Flink

Flink

Spark

BSP

DOT 模型

DOTA 模型

p-DOT 模型

并行计算

MPI

MPS

Kuberentes下 GPU 资源的使用

GPU 虚拟化方案

GPU 非虚拟化方案

NVIDIA GPU OPERATOR

参考

1.7 - PlantUML + Archimate 记录

背景

工具

详细

1.3 - 消息队列资料汇总