pandapower

Pandapower

For powerflow system simulation!

Create an Empty Network

1
pandapower.create.create_empty_network(name='', f_hz=50.0, sn_mva=1, add_stdtypes=True)

This function initializes the pandapower datastructure.

OPTIONAL:
f_hz (float, 50. Default 50hz) - power system frequency in hertz

name (string, None) - name for the network

sn_mva (float, 1e3) - reference apparent power for per unit system

add_stdtypes (boolean, True) - Includes standard types to net

OUTPUT:
net (attrdict) - PANDAPOWER attrdict with empty tables:

EXAMPLE:
net = create_empty_network()

Bus

1
pandapower.create_bus(net, vn_kv, name=None, index=None, geodata=None, type='b', zone=None, in_service=True, max_vm_pu=nan, min_vm_pu=nan, coords=None, **kwargs)

Adds one bus in table net[“bus”].

Busses are the nodes of the network that all other elements connect to.

INPUT:

  • net (pandapowerNet) - The pandapower network in which the element is created
  • vn_kv (float) - The grid voltage level.

OPTIONAL:

  • name (string, default None) - the name for this bus
  • index (int, default None) - Force a specified ID if it is available. If None, the index one higher than the highest already existing index is selected.
  • geodata ((x,y)-tuple, default None) - coordinates used for plotting
  • type (string, default “b”) - Type of the bus. “n” - node, “b” - busbar, “m” - muff
  • zone (string, None) - grid region
  • in_service (boolean) - True for in_service or False for out of service
  • max_vm_pu (float, NAN) - Maximum bus voltage in p.u. - necessary for OPF
  • min_vm_pu (float, NAN) - Minimum bus voltage in p.u. - necessary for OPF
  • coords (list (len=2) of tuples (len=2), default None) - busbar coordinates to plot the bus with multiple points. coords is typically a list of tuples (start and endpoint of the busbar) - Example: [(x1, y1), (x2, y2)]

Electric Model

Result Parameters

net.res_bus

Parameter Datatype Explanation
vm_pu float voltage magnitude [p.u]
va_degree float voltage angle [degree]
p_mw float resulting active power demand [MW]
q_mvar float resulting reactive power demand [Mvar]

net.res_bus_3ph

这是一个dataframe,存储了电力网络中每个三相母线的结果数据。它包含了执行潮流分析(如负载流计算)后的结果,主要用于记录和查看每个目线在网络模拟结束后的电压,相位和其他可能的电气参数。

Parameter Datatype Explanation
vm_a_pu float voltage magnitude:Phase A [p.u]
va_a_degree float voltage angle:Phase A [degree]
vm_b_pu float voltage magnitude:Phase B [p.u]
va_b_degree float voltage angle:Phase B [degree]
vm_c_pu float voltage magnitude:Phase C [p.u]
va_c_degree float voltage angle:Phase C [degree]
p_a_mw float resulting active power demand:Phase A [MW]
q_a_mvar float resulting reactive power demand:Phase A [Mvar]
p_b_mw float resulting active power demand:Phase B [MW]
q_b_mvar float resulting reactive power demand:Phase B [Mvar]
p_c_mw float resulting active power demand:Phase C [MW]
q_c_mvar float resulting reactive power demand:Phase C [Mvar]
unbalance_percent float unbalance in percent defined as the ratio of V0 and V1 according to IEC 6

net.res_bus_est

存储电力网络中母线的状态估计结果。状态估计是电力系统监控和运营中的一种关键技术

The state estimation results are put into net.res_bus_est with the same definition as in net.res_bus.

Parameter Datatype Explanation
vm_pu float voltage magnitude [p.u]
va_degree float voltage angle [degree]
p_mw float resulting active power demand [MW]
q_mvar float resulting reactive power demand [Mvar]

net.res_bus_sc

它存储了电力网络中每个母线的短路分析结果

The short-circuit (SC) results are put into net.res_bus_sc with following definitions:

Parameter Datatype Explanation
ikss_ka float initial short-circuit current value [kA]
skss_mw float initial short-circuit power [MW]
ip_ka float peak value of the short-circuit current [kA]
ith_ka float equivalent thermal short-circuit current [kA]
rk_ohm float resistive part of equiv. (positive/negative sequence) SC impedance [Ohm]
xk_ohm float reactive part of equiv. (positive/negative sequence) SC impedance [Ohm]
rk0_ohm float resistive part of equiv. (zero sequence) SC impedance [Ohm]
xk0_ohm float reactive part of equiv. (zero sequence) SC impedance [Ohm]

Line Creation

1
pandapower.create_line(net, from_bus, to_bus, length_km, std_type, name=None, index=None, geodata=None, df=1.0, parallel=1, in_service=True, max_loading_percent=nan, alpha=nan, temperature_degree_celsius=nan, **kwargs)

Creates a line element in net[“line”] The line parameters are defined through the standard type library.

INPUT:

  • net - The net within this line should be created
  • from_bus (int) - ID of the bus on one side which the line will be connected with
  • to_bus (int) - ID of the bus on the other side which the line will be connected with
  • length_km (float) - The line length in km
  • std_type (string) - Name of a standard linetype :Pre-defined in standard_linetypesorCustomized std_type made using
  • create_std_type()

OPTIONAL:

  • name (string, None) - A custom name for this line
  • index (int, None) - Force a specified ID if it is available. If None, the index one higher than the highest already existing index is selected.
  • geodata (Iterable[Tuple[int, int]|Tuple[float, float]], default None) - The geodata of the line. The first element should be the coordinates of from_bus and the last should be the coordinates of to_bus. The points in the middle represent the bending points of the line
  • in_service (boolean, True) - True for in_service or False for out of service
  • df (float, 1) - derating factor: maximal current of line in relation to nominal current of line (from 0 to 1)
  • parallel (integer, 1) - number of parallel line systems
  • max_loading_percent (float) - maximum current loading (only needed for OPF)
  • alpha (float) - temperature coefficient of resistance: R(T) = R(T_0) * (1 + alpha * (T - T_0)))
  • temperature_degree_celsius (float) - line temperature for which line resistance is adjusted
  • tdpf (bool) - whether the line is considered in the TDPF calculation
  • wind_speed_m_per_s (float) - wind speed at the line in m/s (TDPF)
  • wind_angle_degree (float) - angle of attack between the wind direction and the line (TDPF)
  • conductor_outer_diameter_m (float) - outer diameter of the line conductor in m (TDPF)
  • air_temperature_degree_celsius (float) - ambient temperature in °C (TDPF)
  • reference_temperature_degree_celsius (float) - reference temperature in °C for which r_ohm_per_km for the line is specified (TDPF)
  • solar_radiation_w_per_sq_m (float) - solar radiation on horizontal plane in W/m² (TDPF)
  • solar_absorptivity (float) - Albedo factor for absorptivity of the lines (TDPF)
  • emissivity (float) - Albedo factor for emissivity of the lines (TDPF)
  • r_theta_kelvin_per_mw (float) - thermal resistance of the line (TDPF, only for simplified method)
  • mc_joule_per_m_k (float) - specific mass of the conductor multiplied by the specific thermal capacity of the material (TDPF, only for thermal inertia consideration with tdpf_delay_s parameter)

OUTPUT:

  • index (int) - The unique ID of the created line

EXAMPLE:

1
create_line(net, “line1”, from_bus = 0, to_bus = 1, length_km=0.1, std_type=”NAYY 4x50 SE”)
1
pandapower.create_line_from_parameters(net, from_bus, to_bus, length_km, r_ohm_per_km, x_ohm_per_km, c_nf_per_km, max_i_ka, name=None, index=None, type=None, geodata=None, in_service=True, df=1.0, parallel=1, g_us_per_km=0.0, max_loading_percent=nan, alpha=nan, temperature_degree_celsius=nan, r0_ohm_per_km=nan, x0_ohm_per_km=nan, c0_nf_per_km=nan, g0_us_per_km=0, endtemp_degree=nan, **kwargs)

Creates a line element in net[“line”] from line parameters.

INPUT:

  • net - The net within this line should be created
  • from_bus (int) - ID of the bus on one side which the line will be connected with
  • to_bus (int) - ID of the bus on the other side which the line will be connected with
  • length_km (float) - The line length in km
  • r_ohm_per_km (float) - line resistance in ohm per km
  • x_ohm_per_km (float) - line reactance in ohm per km
  • c_nf_per_km (float) - line capacitance (line-to-earth) in nano Farad per km
  • r0_ohm_per_km (float) - zero sequence line resistance in ohm per km
  • x0_ohm_per_km (float) - zero sequence line reactance in ohm per km
  • c0_nf_per_km (float) - zero sequence line capacitance in nano Farad per km
  • max_i_ka (float) - maximum thermal current in kilo Ampere

OPTIONAL:

  • name (string, None) - A custom name for this line
  • index (int, None) - Force a specified ID if it is available. If None, the index one higher than the highest already existing index is selected.
  • in_service (boolean, True) - True for in_service or False for out of service
  • type (str, None) - type of line (“ol” for overhead line or “cs” for cable system)
  • df (float, 1) - derating factor: maximal current of line in relation to nominal current of line (from 0 to 1)
  • g_us_per_km (float, 0) - dielectric conductance in micro Siemens per km
  • g0_us_per_km (float, 0) - zero sequence dielectric conductance in micro Siemens per km
  • parallel (integer, 1) - number of parallel line systems
  • geodata (array, default None, shape= (,2)) - The geodata of the line. The first row should be the coordinates of bus a and the last should be the coordinates of bus b. The points in the middle represent the bending points of the line
  • max_loading_percent (float) - maximum current loading (only needed for OPF)
  • alpha (float) - temperature coefficient of resistance: R(T) = R(T_0) * (1 + alpha * (T - T_0)))
  • temperature_degree_celsius (float) - line temperature for which line resistance is adjusted
  • tdpf (bool) - whether the line is considered in the TDPF calculation
  • wind_speed_m_per_s (float) - wind speed at the line in m/s (TDPF)
  • wind_angle_degree (float) - angle of attack between the wind direction and the line (TDPF)
  • conductor_outer_diameter_m (float) - outer diameter of the line conductor in m (TDPF)
  • air_temperature_degree_celsius (float) - ambient temperature in °C (TDPF)
  • reference_temperature_degree_celsius (float) - reference temperature in °C for which r_ohm_per_km for the line is specified (TDPF)
  • solar_radiation_w_per_sq_m (float) - solar radiation on horizontal plane in W/m² (TDPF)
  • solar_absorptivity (float) - Albedo factor for absorptivity of the lines (TDPF)
  • emissivity (float) - Albedo factor for emissivity of the lines (TDPF)
  • r_theta_kelvin_per_mw (float) - thermal resistance of the line (TDPF, only for simplified method)
  • mc_joule_per_m_k (float) - specific mass of the conductor multiplied by the specific thermal capacity of the material (TDPF, only for thermal inertia consideration with tdpf_delay_s parameter)

OUTPUT:

  • index (int) - The unique ID of the created line

EXAMPLE:

1
create_line_from_parameters(net, “line1”, from_bus = 0, to_bus = 1, lenght_km=0.1, r_ohm_per_km = .01, x_ohm_per_km = 0.05, c_nf_per_km = 10, max_i_ka = 0.4)

net.line

This is for defining a line with length zero leads to a division by zero in the power flow and is therefore for the same reason.

Parameter Datatype Value Range Explanation
name string name of the line
std_type string standard type which can be used to easily define line parameters with the pandapower standard type library
from_bus* integer Index of bus where the line starts
to_bus* integer Index of bus where the line ends
length_km* float > 0 length of the line [km]
r_ohm_per_km* float ≥ 0 resistance of the line [Ohm per km]
x_ohm_per_km* float ≥ 0 inductance of the line [Ohm per km]
c_nf_per_km* float ≥ 0 capacitance of the line (line-to-earth) [nano Farad per km]
r0_ohm_per_km**** float ≥ 0 zero sequence resistance of the line [Ohm per km]
x0_ohm_per_km**** float ≥ 0 zero sequence inductance of the line [Ohm per km]
c0_nf_per_km**** float ≥ 0 zero sequence capacitance of the line [nano Farad per km]
g_us_per_km* float ≥ 0 dielectric conductance of the line [micro Siemens per km]
max_i_ka* float > 0 maximal thermal current [kilo Ampere]
parallel* integer ≥ 1 number of parallel line systems
df* float 0…1 derating factor (scaling) for max_i_ka
type string Naming conventions:“ol” - overhead line“cs” - underground cable system type of line
max_loading_percent** float > 0 Maximum loading of the line
endtemp_degree*** float > 0 Short-Circuit end temperature of the line
in_service* boolean True / False specifies if the line is in service.

*necessary for executing a balanced power flow calculation
** optimal power flow parameter
***short-circuit calculation parameter
**** necessary for executing a three phase power flow / single phase short circuit .. note:

net.line_geodata

Parameter Datatype Explanation
coords list List of (x,y) tuples that mark the inflexion points of the line

Switch

1
pandapower.create.create_switch(net, bus, element, et, closed=True, type=None, name=None, index=None, z_ohm=0, in_ka=nan, **kwargs)

Adds a switch in the net[“switch”] table.

Switches can be either between two buses (bus-bus switch) or at the end of a line or transformer element (bus-element switch).

Two buses that are connected through a closed bus-bus switches are fused in the power flow if the switch is closed or separated if the switch is open.

An element that is connected to a bus through a bus-element switch is connected to the bus if the switch is closed or disconnected if the switch is open.

INPUT:

  • net (pandapowerNet) - The net within which this switch should be created

  • bus - The bus that the switch is connected to

  • element - index of the element: bus id if et == “b”, line id if et == “l”, trafo id if et == “t”

    • et - (string) element type: “l” = switch between bus and line, “t” = switch between

      bus and transformer, “t3” = switch between bus and transformer3w, “b” = switch between two buses

OPTIONAL:

  • closed (boolean, True) - switch position: False = open, True = closed
  • type (int, None) - indicates the type of switch: “LS” = Load Switch, “CB” = Circuit Breaker, “LBS” = Load Break Switch or “DS” = Disconnecting Switch
    • z_ohm (float, 0) - indicates the resistance of the switch, which has effect only on bus-bus switches, if sets to 0, the buses will be fused like before, if larger than 0 a branch will be created for the switch which has also effects on the bus mapping
  • name (string, default None) - The name for this switch
  • in_ka (float, default None) - maximum current that the switch can carry
    • normal operating conditions without tripping

OUTPUT:

sid - The unique switch_id of the created switch

EXAMPLE:

create_switch(net, bus = 0. element = 1, et = ‘b’, type = ‘LS’, z_ohm = 0.1)

create_switch(net, bus = 0. element = 1, et = ‘I’)

Input Parameters of net.switch

Parameter Datatype Value Range Explanation
bus* integer index of connected bus
name string name of the switch
element* integer index of the element the switch is connected to:- bus index if et = “b”- line index if et = “l”- trafo index if et = “t”
et* string “b” - bus-bus switch“l” - bus-line switch“t” - bus-trafo“t3” - bus-trafo3w switch element type the switch connects to
type string naming conventions:“CB” - circuit breaker“LS” - load switch“LBS” - load break switch“DS” - disconnecting switch type of switch
closed* boolean True / False signals the switching state of the switch
in_ka* float >0 maximum current that the switch can carry under normal operating conditions without tripping

Load

1
pandapower.create_load(net, bus, p_mw, q_mvar=0, const_z_percent=0, const_i_percent=0, sn_mva=nan, name=None, scaling=1.0, index=None, in_service=True, type='wye', max_p_mw=nan, min_p_mw=nan, max_q_mvar=nan, min_q_mvar=nan, controllable=nan, **kwargs)

Adds one load in table net[“load”].

All loads are modelled in the consumer system, meaning load is positive and generation is negative active power. Please pay attention to the correct signing of the reactive power as well.

INPUT:

  • net - The net within this load should be created
  • bus (int) - The bus id to which the load is connected

OPTIONAL:

  • p_mw (float, default 0) - The active power of the load
    • positive value -> load
    • negative value -> generation
  • q_mvar (float, default 0) - The reactive power of the load
  • const_z_percent (float, default 0) - percentage of p_mw and q_mvar that will be associated to constant impedance load at rated voltage
  • const_i_percent (float, default 0) - percentage of p_mw and q_mvar that will be associated to constant current load at rated voltage
  • sn_mva (float, default None) - Nominal power of the load
  • name (string, default None) - The name for this load
  • scaling (float, default 1.) - An OPTIONAL scaling factor. Multiplies with p_mw and q_mvar.
  • type (string, ‘wye’) - type variable to classify the load: wye/delta
  • index (int, None) - Force a specified ID if it is available. If None, the index one higher than the highest already existing index is selected.
  • in_service (boolean) - True for in_service or False for out of service
  • max_p_mw (float, default NaN) - Maximum active power load - necessary for controllable loads in for OPF
  • min_p_mw (float, default NaN) - Minimum active power load - necessary for controllable loads in for OPF
  • max_q_mvar (float, default NaN) - Maximum reactive power load - necessary for controllable loads in for OPF
  • min_q_mvar (float, default NaN) - Minimum reactive power load - necessary for controllable loads in OPF
  • controllable (boolean, default NaN) - States, whether a load is controllable or not. Only respected for OPF; defaults to False if “controllable” column exists in DataFrame

OUTPUT:

  • index (int) - The unique ID of the created element

EXAMPLE:

  • create_load(net, bus=0, p_mw=10., q_mvar=2.)

state:

Action:

Communication Approach

Kepler system

Satellite communication system which provides high-capacity satellite data services for IoT and other data backhaul applications.

S-Band 2.4 GHz, LoRa Edge LR1110

The microwave inter-satellite link already used by Iridium Next has a frequency of 23 GHz. It is relatively mature, with high reliability and a relatively beam.

Cons: Low data transmission rate and low anti-interference ability

Optical inter-satellite link which uses the laser as the carrier of information transmission between satellites.

Jargons

Network Calculus Theory

Possion Distribution/Process

Queueing Theory

Space Communications Protocol Specifications (SCPS)

Delay Tolerant Networks (DTN)

SDN

SDN Learning

SDN/VNF Satellite Experiment

Environment Set Up

Ubuntu network bug

1
sudo ip link set enp0s1 up
1
2
3
sudo dhclient enp0s1

sudo ip addr add [Your-IPv4-Address]/[Subnet-Mask] dev enp0s1
1
2
3
4
5
sudo systemctl restart NetworkManager



sudo systemctl restart networking

Controller (OpenDaylight)

  1. Prepare the operating system
  2. Install the Java JRE
  3. Set JAVA_HOME
  4. Download the OpenDaylight Zip
  5. Unzip OpenDaylight
  6. Start OpenDaylight
  7. Bonus: Where did DLUX go?
1. Prepare Operating System
1
2
sudo apt-get -y update
sudo apt-get -y install unzip
2. Install the Java JRE

OpenDaylight runs on Java platform

1
sudo apt-get -y install openjdk-11-jre
1
sudo update-alternatives --config java

Please select Java 11 as your default java environment

3. Set JAVA_HOME
1
ls -l /etc/alternatives/java
1
echo 'export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-arm64' >> ~/.bashrc
1
2
source ~/.bashrc
echo $JAVA_HOME

echo ‘export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64’ >> ~/.bashrc

4. Download the OpenDaylight Zip Archive
1
curl -XGET -O https://nexus.opendaylight.org/content/repositories/opendaylight.release/org/opendaylight/integration/karaf/0.16.1/karaf-0.16.1.zip
5. Install OpenDaylight
1
$ unzip karaf-0.16.1.zip 
6. Start OpenDaylight
1
./bin/karaf 
1
opendaylight-user@root>feature:list
7. Bonus: What happened to DLUX?
1
curl -O https://nexus.opendaylight.org/content/repositories/opendaylight.release/org/opendaylight/integration/karaf/0.8.4/karaf-0.8.4.zip
1
unzip karaf-0.8.4.zip 
1
./bin/karaf 
1
sudo apt-get install openjdk-8-jre
1
sudo update-alternatives --config java

select Java 8

RYU controller

1
ryu-manager simple_switch_13.py
1
sudo mn --custom topology.py --topo mytopo --controller=remote,ip=127.0.0.1,port=6653
1
mn --topo=single,3 --mac --controller=remote,ip=127.0.0.1,port=6653 --switch ovsk,protocols=OpenFlow13

s1 = net.addSwitch(‘s1’, cls = OVSKernelSwitch, dpid= ‘0000000000000001’)

sh ovs-vsctl show

dpctl dump-flows

dpctl del-flows

dpctl add-flow in_port=1,actions=output:2

dpctl add-flow in_port=2,actions=output:1

1
2
3
4
5
h1 cat /tmp/iperf_client_1.log

h1 cat /tmp/iperf.log

h1 scp /tmp/iperf_client_h1.log limingwei@192.168.64.5:/home/limingwei/Desktop/mydir/tmp
1
2
3
4
h49 hping3 --flood --syn --rand-source h3

cp /tmp/iperf*.log /home/limingwei/Desktop/mydir/tmp_flooding/

![image-20240110101757590](/Users/joshua/Library/Application Support/typora-user-images/image-20240110101757590.png)

1
sudo mn --link tc,bw=10,delay=10ms
1
2
3
h1 iperf -s

h2 iperf -c 10.0.0.1 -i 1 -t 10

Add flow rules for based l2 addresses

1
2
3
4
sudo ovs-ofctl add-flow s1 "dl_dst=02:89:c4:04:33:36, action=1"
sudo ovs-ofctl add-flow s1 "dl_dst=3a:f4:85:86:30:c3, action=2"
sudo ovs-ofctl add-flow s1 "dl_dst=d2:5e:c6:63:ef:af, action=3"
sudo ovs-ofctl add-flow s1 "dl_dst=ff:ff:ff:ff:ff:ff, action=flood"

SDN course

ToR Switch:

Top-of-Rack Switching (ToR) is a type of network infrastructure that uses network switches to connect servers and other devices in the same rack. This type of switching allows for faster data transfer between devices and improved performance.

Create a topology with three hosts:

1
mn --topo single,3 --mac --controller default
1
sudo ovs-ofctl dump-flows s1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
mn --topo single,3 --mac --controller remote

#1
sudo ovs-ofctl add-flow s1 "in_port=2,ip, nw_dst=192.168.1.0/24,action=output:3"
sudo ovs-ofctl add-flow s1 "in_port=3,ip, nw_dst=192.168.1.0/24,action=output:2"

#2
sudo ovs-ofctl add-flow s1 arp,actions=flood

for i in {1..3}; do sudo ovs-ofctl add-flow s1 ip,nw_dst=10.0.0.$i,actions=output:$i;done

#3
h1 ifconfig |grep -i ether
h2 ifconfig |grep -i ether
h3 ifconfig |grep -i ether

sudo ovs-ofctl add-flow s1 "dl_dst=00:00:00:00:00:01, action=1"
sudo ovs-ofctl add-flow s1 "dl_dst=00:00:00:00:00:02, action=2"
sudo ovs-ofctl add-flow s1 "dl_dst=00:00:00:00:00:03, action=3"
sudo ovs-ofctl add-flow s1 "dl_dst=ff:ff:ff:ff:ff:ff, action=flood"


Lab 4

Create the topology

1
sudo mn --topo=single,3 --mac --controller=remote,ip=127.0.0.1,port=6653 --switch ovsk,protocols=OpenFlow13

specify for the ons-ofctl as well which OpenFlow version to use, for instance.

1
2
3
4
mininet> sh sudo ovs−ofctl −O OpenFlow13 dump−flows s1

sh sudo watch -n1 'ovs-ofctl -O OpenFlow13 dump-flows s1'

Learning_switch.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
from ryu.base import app_manager
from ryu.controller import ofp_event,dpset
from ryu.controller.handler import MAIN_DISPATCHER, CONFIG_DISPATCHER
from ryu.controller.handler import set_ev_cls
from ryu.ofproto import ofproto_v1_3
from ryu.lib.packet import packet, ethernet, ether_types
import logging as log

class SimpleSwitch13(app_manager.RyuApp):
OFP_VERSIONS = [ofproto_v1_3.OFP_VERSION]

def __init__(self, *args, **kwargs):
super(SimpleSwitch13,self).__init__(*args, **kwargs)
log.basicConfig(format='%(levelname)s:%(message)s',level=log.DEBUG)
log.info("Controller is up and running")
self.mac_to_port = {}

@set_ev_cls(ofp_event.EventOFPSwitchFeatures, CONFIG_DISPATCHER)
def switch_features_handler(self, ev):
datapath = ev.msg.datapath
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
log.info("Switch connected with datapath ID: {}".format(datapath.id))
match = parser.OFPMatch()
actions = [parser.OFPActionOutput(ofproto.OFPP_CONTROLLER,
ofproto.OFPCML_NO_BUFFER)]
self.add_flow(datapath, 0, match, actions)

def add_flow(self, datapath, priority, match, actions, buffer_id=None):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser

inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS,
actions)]
if buffer_id:
mod = parser.OFPFlowMod(datapath=datapath, buffer_id=buffer_id,
priority=priority, match=match,
instructions=inst)
else:
mod = parser.OFPFlowMod(datapath=datapath, priority=priority,
match=match, instructions=inst)
datapath.send_msg(mod)

@set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
def _packet_in_handler(self,ev):
msg = ev.msg
datapath = msg.datapath
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
in_port = msg.match['in_port']

pkt = packet.Packet(msg.data)
eth = pkt.get_protocols(ethernet.ethernet)[0]

self.logger.info("PacketIn received: \n"
"Switch: {}\n"
"in_port: {}\n".format(datapath.id,in_port))

dst = eth.dst
src = eth.src

dpid = format(datapath.id, "d").zfill(16)
self.mac_to_port.setdefault(dpid,{})

self.logger.info("packet in %s %s %s %s", dpid, src, dst, in_port)

self.mac_to_port[dpid][src] = in_port

if dst in self.mac_to_port[dpid]:
out_port = self.mac_to_port[dpid][dst]
else:
out_port = ofproto.OFPP_FLOOD

actions = [parser.OFPActionOutput(out_port)]

if out_port != ofproto.OFP_NO_BUFFER:
match = parser.OFPMatch(in_port = in_port, eth_dst = dst, eth_src = src)

if msg.buffer_id != ofproto.OFP_NO_BUFFER:
self.add_flow(datapath, 1, match, actions, msg.buffer_id)
return
else:
self.add_flow(datapath,1,match,actions)
data = None
if msg.buffer_id == ofproto.OFP_NO_BUFFER:
data = msg.data
out = parser.OFPPacketOut(datapath = datapath, buffer_id = msg.buffer_id, in_port = in_port, actions = actions, data = data)
datapath.send_msg(out)



Lab 5

LLDP:

Features: Vendor independent/agonostic

LLDP messages are periodically sent by switches by default. A flow rule should be installed into all switches’ flow tables to match on these specific LLDP packets and send them towards the controller as a Packet-In message.

Insert a matching following flow rule into each switch’s flow table:

1
sudo ovs-ofctl -O Openflow13 add-flow <switch_name> priority=65535, dl_type=0x88cc, actions=CONTROLLER:65535

Ryu can also do the job for us by starting our controoler application by extending the command with the following argument:

1
sudo ryu-manager learning_switch.py --observe-links

Topology discovery

Ryu controller provides ryu.topology, which is a switch and link discovery module providing API calls for accessing host, switch, and link data. In order to access it, import the following libraries into our controller application.

1
2
from ryu.topology import event
from ryu.topology.api import get_switch, get_link, get_host
1
self.topology api app = self # put this into the constructor of the ryu controller

Whenever we want to get the latest topology information, we can use the 3 imported functions:

1
2
3
4
5
6
7
8
# get switch list
switch_list = get_switch(self.topology_api-app, None)

#get links and their endpoints
link_list = get_link(self.topology_api_app, None)

#get hosts if there is any
hosts_list = get_host(self.topology_api_app, None)

In each case, the topology.api returns lists of objects with a variable number of accessible properties:

Link object: it describes a connection between a given pair of switches. The source and target of a link is a Port object that can be accessed by link.src and link.dst, respectively. A Port object further contains relevant information such as the datapath identifier (dpid), name of the interface (name), MAC address of that interface (hw_addr), port identifier (port_no), etc. For example, the source data path identifier of a link and the port id, where it is connected to, is accessible by link.src.dpid and link.src.port_no, respectively.

Switch object: it contains the datapath object per second, and as many Port objects as it has.

Host object: it contains its MAC address (host.mac), IPv4 and IPv6 addresses (host.ipv4 and host.ipv6) and a Port object, again, describing which switch it is connected to (e.g., host.port.dpid, host.port.port_no)

Lab 5 Task 1

Modify the controller application (learning_switch) as follows:

  • Create a function called update_topology_data() that prints out the topology information managed by the topology.api (prettify the output for a more comprehensive view)

![image-20240214160149193](/Users/joshua/Library/Application Support/typora-user-images/image-20240214160149193.png)

  • Each time a new switch connects to the controller, or a host is discovered call update_topoloy_data().
  • As LLDP messages are also sent to the controller and they also contain source and destination MAC addresses, modify your Packet-In handler to ignore these LLDP packets!

​ Hint: To easily match on LLDP packets import the following libraries

1
2
3
#for LLDP packet types
from ryu.ofproto.ether import ETH_TYPE_CFM
from ryu.ofproto.ether import ETH_TYPE_LLDP
  • At least two switches need to exist in the topology in order to obtain link information from Ryu’s API.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
from ryu.base import app_manager
from ryu.lib.packet import lldp
from ryu.controller import ofp_event, dpset
from ryu.controller.handler import MAIN_DISPATCHER, CONFIG_DISPATCHER
from ryu.controller.handler import set_ev_cls
from ryu.ofproto import ofproto_v1_3
from ryu.topology import event
from ryu.topology.api import get_switch, get_link, get_host
from ryu.lib.packet import packet, ethernet, ether_types
from ryu.ofproto.ether import ETH_TYPE_CFM
from ryu.ofproto.ether import ETH_TYPE_LLDP
import logging as log


class SimpleSwitch13(app_manager.RyuApp):
OFP_VERSIONS = [ofproto_v1_3.OFP_VERSION]

def __init__(self, *args, **kwargs):
super(SimpleSwitch13, self).__init__(*args, **kwargs)
log.basicConfig(format='%(levelname)s:%(message)s', level=log.DEBUG)
log.info("Controller is up and running")
self.topology_api_app = self
self.mac_to_port = {}


@set_ev_cls(ofp_event.EventOFPSwitchFeatures, CONFIG_DISPATCHER)
def switch_features_handler(self, ev):
datapath = ev.msg.datapath
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
log.info("Switch connected with datapath ID: {}".format(datapath.id))
match = parser.OFPMatch()
actions = [parser.OFPActionOutput(ofproto.OFPP_CONTROLLER,
ofproto.OFPCML_NO_BUFFER)]
self.add_flow(datapath, 0, match, actions)
self.update_topology_data()

def add_flow(self, datapath, priority, match, actions, buffer_id=None):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser

inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS,
actions)]
if buffer_id:
mod = parser.OFPFlowMod(datapath=datapath, buffer_id=buffer_id,
priority=priority, match=match,
instructions=inst)
else:
mod = parser.OFPFlowMod(datapath=datapath, priority=priority,
match=match, instructions=inst)
datapath.send_msg(mod)

def update_topology_data(self):
# get switch list
switch_list = get_switch(self.topology_api_app, None)
# get links and their endpoints
links_list = get_link(self.topology_api_app, None)
# get hosts if there is any
hosts_list = get_host(self.topology_api_app, None)
self.logger.info("Switches:")
for switch in switch_list:
self.logger.info("s{}".format(switch.dp.id))

self.logger.info("\nLinks:")
for link in links_list:
source = "s{}".format(link.src.dpid)
source_port = link.src.port_no
target = "s{}".format(link.dst.dpid)
target_port = link.dst.port_no
self.logger.info("{}(port:{})<---->(port:{}){}".format(source, source_port, target_port, target))

self.logger.info("Hosts:")
if hosts_list:
for host in hosts_list:
name = "h{}".format(host.mac.replace(':', '')[-6:]) # Using part of MAC for host identifier
self.logger.info("{}:\n\tip4:{}"
"\n\tip6:{}"
"\n\tmac={}" # Added equal sign here
"\n\tconnected to:s{}"
"\n\tport_no:{}".format(name,
host.ipv4 if host.ipv4 else "None",
host.ipv6 if host.ipv6 else "None",
host.mac,
host.port.dpid,
host.port.port_no))

@set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
def _packet_in_handler(self, ev):
msg = ev.msg
datapath = msg.datapath
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
in_port = msg.match['in_port']

pkt = packet.Packet(msg.data)
eth = pkt.get_protocols(ethernet.ethernet)[0]

if eth.ethertype in (ETH_TYPE_LLDP, ETH_TYPE_CFM):
# Ignore LLDP and CFM packets but update the topology
self.update_topology_data()
return

self.logger.info("PacketIn received: \n"
"Switch: {}\n"
"in_port: {}\n".format(datapath.id, in_port))

dst = eth.dst
src = eth.src

dpid = format(datapath.id, "d").zfill(16)
self.mac_to_port.setdefault(dpid, {})

self.logger.info("packet in %s %s %s %s", dpid, src, dst, in_port)

self.mac_to_port[dpid][src] = in_port

if dst in self.mac_to_port[dpid]:
out_port = self.mac_to_port[dpid][dst]
else:
out_port = ofproto.OFPP_FLOOD

actions = [parser.OFPActionOutput(out_port)]

if out_port != ofproto.OFP_NO_BUFFER:
match = parser.OFPMatch(in_port=in_port, eth_dst=dst, eth_src=src)

if msg.buffer_id != ofproto.OFP_NO_BUFFER:
self.add_flow(datapath, 1, match, actions, msg.buffer_id)
return
else:
self.add_flow(datapath, 1, match, actions)
data = None
if msg.buffer_id == ofproto.OFP_NO_BUFFER:
data = msg.data
out = parser.OFPPacketOut(datapath=datapath, buffer_id=msg.buffer_id, in_port=in_port, actions=actions, data=data)
datapath.send_msg(out)
self.update_topology_data()


  • Call this function in the connection and the Packet-In handlers.

  • In our Packet-In handler we should only execute the learning switch-related code when the packets are not LLDP packets:

    1
    2
    if not pkt_eth.ethertype in (ETH_TYPE_LLDP,ETH_TYPE_CFM):
    # LLDP packets will not be processed, we just update the topology

Find paths, run algorithms, optimize routing

By means of the topology.api, we have become able to query topology-related information, such as switches, hosts and links among them. In order to run graph algorithms, e.g., find shortest paths among nodes, recalculate path in case of failure, we need to create a network graph. The good thing about graphs is that the concepts and terminology are naturally intuitive, basically graphs describe structures that map relations between objects. Obiects are called nodes and the connection among them are termed as edges. A graph can be directed or undirected based on the orientation of the edges. Making an edge undirected also implies that is is bi-directional, i.e., A <-> B = B <-> A. If a graph is directed, then the order and direction of edges do matter, ie., A -> B != B -> A.

The edegs can also have weights indicating, for instance, the distance between two nodes.

NetworkX - Basics

NetworkX is the most popular Python library to create, manipulate and analyse graphs. Some of its features are:

  • Data structures for graphs, digraphs, and multigraphs
  • Many standard graph algorithms
  • Nodes can be “anything” (e.g., text, images, XML records)
  • Edges can hold arbitrary data (e.g., weights, time-series)
  • Additional benefits from Python include fast prototyping, easy to teach, and multi-platform

The provided Mininet VM already has networks installed, however in order to reinstall it issue the following command:

1
$ sudo pip install networkx

Exploring NetworkX library and creating a simple Graph object can be done as follows:

1
2
import networkx as nx
G = nx.Graph()

In order to add 2 nodes (identified by Integer numbers) to the graph and connect them with an edge, try the following:

1
2
3
G.add_node(1)
G.add_node(2)
G.add_edge(1,2)

The following non-comprehensive list summarizeds some fundamental function calls:

  • G.add_nodes_from(list_of_nodes): Add a list of nodes to the graph
  • G.remove_node(node): Remove node node from the graph
  • G.node[node]: Get node node from the graph
  • G.add_edges_from(lists_of_edges): Add a list of edges (defined by source-destination pairs) to the graph
  • G.remove_edge(edge): Remove edge edge from the graph
  • G.edge[src] [dst]: Get an edge between nodes src and dst from the graph

You might notice that nodes and edges are not spefified as NetworkX objects. This leaves you free to use meaningful items as nodes and edges. The most common choices are numbers or strings, but a node can be any hashable object (except None). Furthermore, we can also add attributes to graphs, nodes and edges, such as weights, labels, colours or whatever Python object you like. For instance, we can define bandwidth and latecncy attributes for the links:

1
G.add_edge(1,2, bandwidth=25,latency=5)

NetworkX - Algorithms

Network provides many graph algorithms, e.g., connectivity, cliques, clustering, independent set, average node/edge degree, however from all of them we will use the simple and shortest path algorithms. The nx. shortest-path (G, src, dst) function computes the shortest path in graph G between source node src and destination node dst. Additionally, we can set a forth parameter weight, where we can indicate to use that edge attribute as the edge weight. The following example prints out the shortest path as a list from node 1 towards node 3 in graph G.

1
print(nx.shortest_path(G, source = 0, target = 4))

Lab 6

1
sudo mn --custom my_topo.py --topo mytopo --controller=remote,ip=127.0.0.1,port=6653 --switch ovsk,protocols=OpenFlow13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
from ryu.base import app_manager
from ryu.controller import ofp_event
from ryu.controller.handler import CONFIG_DISPATCHER, MAIN_DISPATCHER
from ryu.controller.handler import set_ev_cls
from ryu.ofproto import ofproto_v1_3
from ryu.lib.packet import packet
from ryu.lib.packet import ethernet
from ryu.lib.packet import ether_types
import logging as log

from ryu.ofproto.ether import ETH_TYPE_CFM
from ryu.ofproto.ether import ETH_TYPE_LLDP

# Graph manipulation library
import networkx as nx

# Fetch topology information
from ryu.topology import event
from ryu.topology.api import get_switch, get_link, get_host


class SimpleSwitch13(app_manager.RyuApp):
OFP_VERSIONS = [ofproto_v1_3.OFP_VERSION]

def __init__(self, *args, **kwargs):
super(SimpleSwitch13, self).__init__(*args, **kwargs)
self.mac_to_port = {}
self.G = nx.Graph()
self.shortest_paths = dict()

# necessary to enable the topology monitoring functionality
self.topology_api_app = self

@set_ev_cls(ofp_event.EventOFPSwitchFeatures, CONFIG_DISPATCHER)
def switch_features_handler(self, ev):
datapath = ev.msg.datapath
ofproto = datapath.ofproto
parser = datapath.ofproto_parser

match = parser.OFPMatch()
actions = [parser.OFPActionOutput(ofproto.OFPP_CONTROLLER,
ofproto.OFPCML_NO_BUFFER)]
self.add_flow(datapath, 0, match, actions)
self.update_topology_data()

def add_flow(self, datapath, priority, match, actions, buffer_id=None):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser

inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS,
actions)]
if buffer_id:
mod = parser.OFPFlowMod(datapath=datapath, buffer_id=buffer_id,
priority=priority, match=match,
instructions=inst)
else:
mod = parser.OFPFlowMod(datapath=datapath, priority=priority,
match=match, instructions=inst)
datapath.send_msg(mod)

@set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
def _packet_in_handler(self, ev):
# If you hit this you might want to increase
# the "miss_send_length" of your switch
if ev.msg.msg_len < ev.msg.total_len:
self.logger.debug("packet truncated: only %s of %s bytes",
ev.msg.msg_len, ev.msg.total_len)
msg = ev.msg
datapath = msg.datapath
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
in_port = msg.match['in_port']

pkt = packet.Packet(msg.data)
eth = pkt.get_protocols(ethernet.ethernet)[0]

if eth.ethertype in (ETH_TYPE_LLDP, ETH_TYPE_CFM):
# ignore lldp packet
return

self.update_topology_data()


def update_topology_data(self):
# Get lists for switches, links, and hosts
switch_list = get_switch(self.topology_api_app, None)
links_list = get_link(self.topology_api_app, None)
hosts_list = get_host(self.topology_api_app, None)


self.logger.info("Switches:")
for switch in switch_list:
print(switch)
switch_name="s{}".format(switch.dp.id)
self.logger.info(switch_name)


# add switches to graph with preset attribute names
# define a recent_port data dictionary as an attribute for the
# swithes - it will be updated in each case
# a new port comes up
self.G.add_node(
switch_name,
name=switch_name,
dp=switch.dp,
port=switch.ports
)


self.logger.info("\nLinks:")
for link in links_list:
print(link)
source = "s{}".format(link.src.dpid)
source_port = link.src.port_no
target = "s{}".format(link.dst.dpid)
target_port = link.dst.port_no
self.logger.info("{}(port:{}) <---> (port:{}){}".format(source,source_port,target_port,target))
# networkx links in Graph() are not differentiated by source and destination, so a link and its data become
# updated when add_edge is called with the source and destination swapped
self.G.add_edge(source, target,
src_dpid=source, src_port=link.src.port_no,
dst_dpid=target, dst_port=link.dst.port_no)



self.logger.info("Hosts:\n")
if hosts_list:
for host in hosts_list:
# assemble name according to mac
host_name="h{}".format(host.mac.split(":")[5][1])
print(host_name)
last_byte_hex = host.mac.split(":")[5]
last_byte_int = int(last_byte_hex, 16)

if last_byte_int == 0:
last_byte_int = 1
elif last_byte_int > 254:
last_byte_int = 254

host.ipv4 = "10.0.0.{}".format(last_byte_int)
self.logger.info("{}:\n\tip4:{}"
"\n\tip6:{}"
"\n\tmac:{}"
"\n\tconnected to:s{}"
"\n\tport_no:{}".format(
host_name,
host.ipv4,
host.ipv6,
host.mac,
host.port.dpid,
host.port.port_no
)
)


self.G.add_node(
host_name,
name=host_name,
ipv4=host.ipv4,
ipv6=host.ipv6,
mac=host.mac,
connected_to="s{}".format(host.port.dpid),
port_no=host.port.port_no)
# add corresponding links to the graph
self.G.add_edge(host_name,
"s{}".format(host.port.dpid),
dst_port=host.port.port_no,
dst_dpid="s{}".format(host.port.dpid))




# update shortest paths
self.calculate_all_pair_shortest_paths()
print("Shortest Paths:\n{}".format(self.shortest_paths))
self.install_shortest_paths_flow_rules()


def calculate_shortest_paths(self,src,dst):
'''
This function returns all shortest paths between the given source and destination node
:param src: String - the source node's name
:param dst: String - the destination node's name
:return: list of lists
'''
if src not in self.G.nodes() or dst not in self.G.nodes():
return None
paths = list()
try:
all_sp = nx.all_shortest_paths(self.G, src, dst)
for path in all_sp:
paths.append(path)

except nx.NetworkXNoPath: # no path between src and dst
log.info("No path between {} and {}".format(src, dst))
return None

return paths


def calculate_all_pair_shortest_paths(self):
'''
This function calculates all shortest paths for all source and destinations
Note: NetworkX also have similar function (all_pairs_shortest_path(G[, cutoff])),
however that only gives one shortest path for a given (source,destination) pair
:return: dictionary of dictionary of list of lists, e.g., h1:{h2:[[h1,s1,h2],[h1,s2,h2]]}
'''
all_paths = dict()
for n in self.G.nodes():
if n.startswith('h'): #only hosts are relevant
all_paths[n] = dict()
for m in self.G.nodes():
if m.startswith('h'):
if n == m:
continue
all_paths[n][m] = self.calculate_shortest_paths(n, m)

self.shortest_paths = all_paths


def install_flow_rule_for_chain_link(self, chain_link, chain_prev, chain_next, source_ip, destination_ip):
'''
This function installs matching flow rules on source_ip and destination_ip in switch
chain_link and outputs packets on ports that are connected to its upstream (chain_prev)
and downstream (chain_next) nodes, respectively.
According to the chain_prev and chain_next, it gets the link/port number information
from the graph that stores them
:param chain_link: String - the name of the chain_link
:param chain_prev: String - the name of the previous switch
:param chain_next: String - the name of the next switch
:param source_ip: tuple(String,String) - source host IP address and netmask for the upstream
:param destination_ip: tuple(String,String) - the destination IP address and netmask for the downstream
:return:
'''

datapath = self.G.nodes[chain_link]['dp']
ofp = datapath.ofproto
ofp_parser = datapath.ofproto_parser
match_source_ip = ofp_parser.OFPMatch(eth_type=0x0800, ipv4_dst=source_ip)
match_destination_ip = ofp_parser.OFPMatch(eth_type=0x0800, ipv4_dst=destination_ip)

# --- upstream
# get edge_data
edge = self.G[chain_link][chain_prev]
print ("upstream edge: ", edge)
if edge['dst_dpid'] == chain_link:
# if prev is a host, then it is always the case that edge['dst_port'] stores the port number
out_port = edge['dst_port']
else:
# if prev is a switch, then it might be the src_dpid
out_port = edge['src_port']
actions = [ofp_parser.OFPActionOutput(out_port, 0)]
inst = [ofp_parser.OFPInstructionActions(ofp.OFPIT_APPLY_ACTIONS, actions)]
print("install flow rule for SIP {} - DIP {} at {} to forward packet on port {}".
format(source_ip, destination_ip, chain_link, out_port))
self.send_flow_mod(datapath, None, match=match_source_ip, inst=inst)

# --- downstream
# get edge_data
edge = self.G[chain_link][chain_next]
print("downstream edge: ", edge)
if edge['dst_dpid'] == chain_link:
# if next is a host, then it is always the case that edge['dst_port'] stores the port number
out_port = edge['dst_port']
else:
# if next is a switch, then it might be the src_dpid
out_port = edge['src_port']
actions = [ofp_parser.OFPActionOutput(out_port, 0)]
inst = [ofp_parser.OFPInstructionActions(ofp.OFPIT_APPLY_ACTIONS, actions)]
log.info("install flow rule for SIP {} - DIP {} at {} to forward packet on port {}".
format(source_ip, destination_ip, chain_link, out_port))
self.send_flow_mod(datapath, None, match=match_destination_ip, inst=inst)



def install_shortest_paths_flow_rules(self):
'''
This function will install flow rules according to the shortest paths
:return:
'''
paths = self.shortest_paths
# paths looks like {h1:{h2:[[path1],[path2]]}} <- EXAMPLE
if paths:
for source in paths: # source = h1
source_host = self.G.nodes[source]
print("Source host: {}".format(source_host))
source_ip = (source_host['ipv4'], '255.255.255.255')
print("Source ip: {}".format(source_ip))

# self.log.info(paths[source]) # paths[source] = {h2: [[path1],[path2]]
for p in paths[source]: # p = h2
destination_host = self.G.nodes[p]
destination_ip = (destination_host['ipv4'], '255.255.255.255')
if paths[source][p]:
for path_num, j in enumerate(
paths[source][p]): # paths[source][p] = [[path1],[path2]], j = one path from paths
# install the first rule always!
individual_path = j
if individual_path:
for num, sw in enumerate(individual_path):
# print sw
if sw.startswith('h'):
# it's a host, skip (this will also prevent running out of indexes in both direction (see below))
continue

prev = individual_path[num - 1]
current = individual_path[num]
next = individual_path[num + 1]
self.install_flow_rule_for_chain_link(current, prev, next, source_ip, destination_ip)
# break



def send_flow_mod(self, datapath, msg, **args):
'''
Sending a flow_mod to the given switch
:param datapath: Datapath - datapath of the switch
:param msg: PacketIn message
:param args: cookie=0, table=0, cookie_mask=0,idle_timeout=0,hard_timeout=0,priority=100,buffer_id=OFP_NO_BUFFER,
mod_type= OFPFC_ADD, match=OFPMatch(in_port=1,broadcast_eth_dst),
inst=OFPInstructionActions(apply action,OFPActionOutput(2)),
:return: nothing
'''
ofp = datapath.ofproto
ofp_parser = datapath.ofproto_parser
table_id = args.get('table',0)
cookie = args.get('cookie', 0)
cookie_mask = args.get('cookie_mask',0)
idle_timeout = args.get('idle_timeout', 0)
hard_timeout = args.get('hard_timeout', 0)
priority = args.get('priority', 100)
if msg:
buffer_id = args.get('buffer_id', msg.buffer_id)
else:
buffer_id=ofp.OFP_NO_BUFFER

mod_type = args.get('mod_type', ofp.OFPFC_ADD)


match = args.get('match',ofp_parser.OFPMatch(in_port=1, eth_dst='ff:ff:ff:ff:ff:ff'))
inst = args.get('inst',
[ofp_parser.OFPInstructionActions(
ofp.OFPIT_APPLY_ACTIONS,
[ofp_parser.OFPActionOutput(2)])])



flowmod = ofp_parser.OFPFlowMod(datapath, cookie, cookie_mask,
table_id, mod_type,
idle_timeout, hard_timeout,
priority, buffer_id,
ofp.OFPP_ANY, ofp.OFPG_ANY,
ofp.OFPFF_SEND_FLOW_REM,
match, inst)

# log.info("Sending flowmod:\n {}".format(flowmod))
datapath.send_msg(flowmod)

Lab 7

1
2
3
4
5
6
7
8
9
python tcp_exp.py -b 1000 -d 7.5ms

mininet> xterm h1 h2

h2> iperf -s &

mininet> h2 wireshark

h1> iperf -c 10.0.0.2 -i 1 -n 1M

Network

Network review

Fundamental

TCP/IP Architecture

Application layer

Transimission layer

Network layer

Network Interface layer

image-20230926204726040

What happens after input the url address?

image-20230926220632759

  1. The browser will resolve the URL

image-20230926225140347

当没有路径名时,就代表访问根目录下事先设置的默认文件,也就是 /index.html 或者 /default.html 这些文件,这样就不会发生混乱了。

  1. 生成http请求消息

URL 进行解析之后,浏览器确定了 Web 服务器和文件名,接下来就是根据这些信息来生成 HTTP 请求消息了。

image-20230926230201614

  1. DNS 查询

通过浏览器解析 URL 并生成 HTTP 消息后,需要委托操作系统将消息发送给 Web 服务器。

但在发送之前,还有一项工作需要完成,那就是查询服务器域名对应的 IP 地址,因为委托操作系统发送消息时,必须提供通信对象的 IP 地址。

域名的层级关系

DNS 中的域名都是用句点来分隔的,比如 www.server.com,这里的句点代表了不同层次之间的界限

在域名中,越靠右的位置表示其层级越高

实际上域名最后还有一个点,比如 www.server.com.,这个最后的一个点代表根域名

image-20230926232406414

根域的 DNS 服务器信息保存在互联网中所有的 DNS 服务器中。

这样一来,任何 DNS 服务器就都可以找到并访问根域 DNS 服务器了。

因此,客户端只要能够找到任意一台 DNS 服务器,就可以通过它找到根域 DNS 服务器,然后再一路顺藤摸瓜找到位于下层的某台目标 DNS 服务器。

域名解析的工作流程

  1. 客户端首先会发出一个 DNS 请求,问 www.server.com 的 IP 是啥,并发给本地 DNS 服务器(也就是客户端的 TCP/IP 设置中填写的 DNS 服务器地址)。
  2. 本地域名服务器收到客户端的请求后,如果缓存里的表格能找到 www.server.com,则它直接返回 IP 地址。如果没有,本地 DNS 会去问它的根域名服务器:“老大, 能告诉我 www.server.com 的 IP 地址吗?” 根域名服务器是最高层次的,它不直接用于域名解析,但能指明一条道路。
  3. 根 DNS 收到来自本地 DNS 的请求后,发现后置是 .com,说:“www.server.com 这个域名归 .com 区域管理”,我给你 .com 顶级域名服务器地址给你,你去问问它吧。”
  4. 本地 DNS 收到顶级域名服务器的地址后,发起请求问“老二, 你能告诉我 www.server.com 的 IP 地址吗?”
  5. 顶级域名服务器说:“我给你负责 www.server.com 区域的权威 DNS 服务器的地址,你去问它应该能问到”。
  6. 本地 DNS 于是转向问权威 DNS 服务器:“老三,www.server.com对应的IP是啥呀?” server.com 的权威 DNS 服务器,它是域名解析结果的原出处。为啥叫权威呢?就是我的域名我做主。
  7. 权威 DNS 服务器查询后将对应的 IP 地址 X.X.X.X 告诉本地 DNS。
  8. 本地 DNS 再将 IP 地址返回客户端,客户端和目标建立连接。

image-20230926232617702

当然有缓存的存在,

浏览器会先看自身有没有对这个域名的缓存,如果有,就直接返回,如果没有,就去问操作系统,操作系统也会去看自己的缓存,如果有,就直接返回,如果没有,再去 hosts 文件看,也没有,才会去问「本地 DNS 服务器」。

数据包表示:“DNS 老大哥厉害呀,找到了目的地了!我还是很迷茫呀,我要发出去,接下来我需要谁的帮助呢?”

  1. Stack protocol

浏览器会先看自身有没有对这个域名的缓存,如果有,就直接返回,如果没有,就去问操作系统,操作系统也会去看自己的缓存,如果有,就直接返回,如果没有,再去 hosts 文件看,也没有,才会去问「本地 DNS 服务器」。

数据包表示:“DNS 老大哥厉害呀,找到了目的地了!我还是很迷茫呀,我要发出去,接下来我需要谁的帮助呢?”

TCP

image-20230927095406195

首先,源端口号目标端口号是不可少的,如果没有这两个端口号,数据就不知道应该发给哪个应用。

接下来有包的号,这个是为了解决包乱序的问题。

还有应该有的是确认号,目的是确认发出去对方是否有收到。如果没有收到就应该重新发送,直到送达,这个是为了解决丢包的问题。

接下来还有一些状态位。例如 SYN 是发起一个连接,ACK 是回复,RST 是重新连接,FIN 是结束连接等。TCP 是面向连接的,因而双方要维护连接的状态,这些带状态位的包的发送,会引起双方的状态变更。

还有一个重要的就是窗口大小。TCP 要做流量控制,通信双方各声明一个窗口(缓存大小),标识自己当前能够的处理能力,别发送的太快,撑死我,也别发的太慢,饿死我。

除了做流量控制以外,TCP还会做拥塞控制,对于真正的通路堵车不堵车,它无能为力,唯一能做的就是控制自己,也即控制发送的速度。不能改变世界,就改变自己嘛。

TCP 传输数据之前,要先三次握手建立连接

在 HTTP 传输数据之前,首先需要 TCP 建立连接,TCP 连接的建立,通常称为三次握手

这个所谓的「连接」,只是双方计算机里维护一个状态机,在连接建立的过程中,双方的状态变化时序图就像这样。

image-20230927095600953

  • 一开始,客户端和服务端都处于 CLOSED 状态。先是服务端主动监听某个端口,处于 LISTEN 状态。
  • 然后客户端主动发起连接 SYN,之后处于 SYN-SENT 状态。
  • 服务端收到发起的连接,返回 SYN,并且 ACK 客户端的 SYN,之后处于 SYN-RCVD 状态。
  • 客户端收到服务端发送的 SYNACK 之后,发送对 SYN 确认的 ACK,之后处于 ESTABLISHED 状态,因为它一发一收成功了。
  • 服务端收到 ACKACK 之后,处于 ESTABLISHED 状态,因为它也一发一收了。

如何查看 TCP 的连接状态?

TCP 的连接状态查看,在 Linux 可以通过 netstat -napt 命令查看。

IP

IP 包头格式

在 IP 协议里面需要有源地址 IP目标地址 IP

  • 源地址IP,即是客户端输出的 IP 地址;
  • 目标地址,即通过 DNS 域名解析得到的 Web 服务器 IP。

因为 HTTP 是经过 TCP 传输的,所以在 IP 包头的协议号,要填写为 06(十六进制),表示协议为 TCP。

假设客户端有多个网卡,就会有多个 IP 地址,那 IP 头部的源地址应该选择哪个 IP 呢?

当存在多个网卡时,在填写源地址 IP 时,就需要判断到底应该填写哪个地址。这个判断相当于在多块网卡中判断应该使用哪个一块网卡来发送包。

这个时候就需要根据路由表规则,来判断哪一个网卡作为源地址 IP。

在 Linux 操作系统,我们可以使用 route -n 命令查看当前系统的路由表。

举个例子,根据上面的路由表,我们假设 Web 服务器的目标地址是 192.168.10.200

路由规则判断

  1. 首先先和第一条目的子网掩码(Genmask)进行 与运算,得到结果为 192.168.10.0,但是第一个条目的 Destination192.168.3.0,两者不一致所以匹配失败。
  2. 再与第二条目的子网掩码进行 与运算,得到的结果为 192.168.10.0,与第二条目的 Destination 192.168.10.0 匹配成功,所以将使用 eth1 网卡的 IP 地址作为 IP 包头的源地址。

那么假设 Web 服务器的目标地址是 10.100.20.100,那么依然依照上面的路由表规则判断,判断后的结果是和第三条目匹配。

第三条目比较特殊,它目标地址和子网掩码都是 0.0.0.0,这表示默认网关,如果其他所有条目都无法匹配,就会自动匹配这一行。并且后续就把包发给路由器,Gateway 即是路由器的 IP 地址。

MAC

MAC 头部是以太网使用的头部,它包含了接收方和发送方的 MAC 地址等信息。

在 MAC 包头里需要发送方 MAC 地址接收方目标 MAC 地址,用于两点之间的传输

一般在 TCP/IP 通信里,MAC 包头的协议类型只使用:

  • 0800 : IP 协议
  • 0806 : ARP 协议

MAC 发送方和接收方如何确认?

发送方的 MAC 地址获取就比较简单了,MAC 地址是在网卡生产时写入到 ROM 里的,只要将这个值读取出来写入到 MAC 头部就可以了。

接收方的 MAC 地址就有点复杂了,只要告诉以太网对方的 MAC 的地址,以太网就会帮我们把包发送过去,那么很显然这里应该填写对方的 MAC 地址。

所以先得搞清楚应该把包发给谁,这个只要查一下路由表就知道了。在路由表中找到相匹配的条目,然后把包发给 Gateway 列中的 IP 地址就可以了。

既然知道要发给谁,按如何获取对方的 MAC 地址呢?

不知道对方 MAC 地址?不知道就喊呗。

此时就需要 ARP 协议帮我们找到路由器的 MAC 地址。

ARP 协议会在以太网中以广播的形式,对以太网所有的设备喊出:“这个 IP 地址是谁的?请把你的 MAC 地址告诉我”。

然后就会有人回答:“这个 IP 地址是我的,我的 MAC 地址是 XXXX”。

如果对方和自己处于同一个子网中,那么通过上面的操作就可以得到对方的 MAC 地址。然后,我们将这个 MAC 地址写入 MAC 头部,MAC 头部就完成了。

好像每次都要广播获取,这不是很麻烦吗?

放心,在后续操作系统会把本次查询结果放到一块叫做 ARP 缓存的内存空间留着以后用,不过缓存的时间就几分钟。

也就是说,在发包时:

  • 先查询 ARP 缓存,如果其中已经保存了对方的 MAC 地址,就不需要发送 ARP 查询,直接使用 ARP 缓存中的地址。
  • 而当 ARP 缓存中不存在对方 MAC 地址时,则发送 ARP 广播查询。

在 Linux 系统中,我们可以使用 arp -a 命令来查看 ARP 缓存的内容。

ARP 缓存内容

MAC 报文生成

至此,网络包的报文如下图。

MAC 层报文

网卡NIC (Network Interface Card)99

Function:

Convert the digital information into electrical signal.

NIC need the help with NIC drive

NIC drive

网卡驱动获取网络包之后,会将其复制到网卡内的缓存区中,接着会在其开头加上报头和起始帧分界符,在末尾加上用于检测错误的帧校验序列

Switch

首先,电信号到达网线接口,交换机里的模块进行接收,接下来交换机里的模块将电信号转换为数字信号。

然后通过packet末尾的 FCS 校验错误,如果没问题则放到缓冲区。这部分操作基本和计算机的网卡相同,但交换机的工作方式和网卡不同。

相对地,交换机的端口不核对接收方 MAC 地址,而是直接接收所有的包并存放到缓冲区中。因此,和网卡不同,交换机的端口不具有 MAC 地址(注意是端口没有)。

将包存入缓冲区后,接下来需要查询一下这个包的接收方 MAC 地址是否已经在交换机的 MAC 地址表中有记录了。(交换机的mac地址表主要有设备的mac地址,然后还有设备与交换机连接的端口号)

Router

网络包经过交换机之后,现在到达了路由器,并在此被转发到下一个路由器或目标设备。

这一步转发的工作原理和交换机类似,也是通过查表判断包转发的目标。

不过在具体的操作过程上,路由器和交换机是有区别的。

  • 因为路由器是基于 IP 设计的,俗称三层网络设备,路由器的各个端口都具有 MAC 地址和 IP 地址;
  • 交换机是基于以太网设计的,俗称二层网络设备,交换机的端口不具有 MAC 地址。

路由器的原理

路由器的端口具有 MAC 地址,因此它就能够成为以太网的发送方和接收方;同时还具有 IP 地址,从这个意义上来说,它和计算机的网卡是一样的。

当转发包时,首先路由器端口会接收发给自己的以太网包,然后路由表查询转发目标,再由相应的端口作为发送方将以太网包发送出去。

路由器的包接收

首先,电信号到达网线接口部分,路由器中的模块会将电信号转成数字信号,然后通过包末尾的 FCS 进行错误校验。

如果没问题则检查 MAC 头部中的接收方 MAC 地址,看看是不是发给自己的包,如果是就放到接收缓冲区中,否则就丢弃这个包。

总的来说,路由器的端口都具有 MAC 地址,只接收与自身地址匹配的包,遇到不匹配的包则直接丢弃。

查询路由表确定输出端口

完成包接收操作之后,路由器就会去掉包开头的 MAC 头部。

MAC 头部的作用就是将包送达路由器,其中的接收方 MAC 地址就是路由器端口的 MAC 地址。因此,当包到达路由器之后,MAC 头部的任务就完成了,于是 MAC 头部就会被丢弃

接下来,路由器会根据 MAC 头部后方的 IP 头部中的内容进行包的转发操作。

转发操作分为几个阶段,

首先是查询路由表判断转发目标。(建立路由)

路由器转发

假设地址为 10.10.1.101 的计算机要向地址为 192.168.1.100 的服务器发送一个包,这个包先到达图中的路由器。

判断转发目标的第一步,就是根据包的接收方 IP 地址查询路由表中的目标地址栏,以找到相匹配的记录。

路由匹配和前面讲的一样,每个条目的子网掩码和 192.168.1.100 IP 做 & 与运算后,得到的结果与对应条目的目标地址进行匹配,如果匹配就会作为候选转发目标,如果不匹配就继续与下个条目进行路由匹配。

如第二条目的子网掩码 255.255.255.0192.168.1.100 IP 做 & 与运算后,得到结果是 192.168.1.0 ,这与第二条目的目标地址 192.168.1.0 匹配,该第二条目记录就会被作为转发目标。

实在找不到匹配路由时,就会选择默认路由,路由表中子网掩码为 0.0.0.0 的记录表示「默认路由」。

路由器的发送操作

首先,我们需要根据路由表的网关列判断对方的地址。

  • 如果网关是一个 IP 地址,则这个IP 地址就是我们要转发到的目标地址,还未抵达终点,还需继续需要路由器转发。
  • 如果网关为空,则 IP 头部中的接收方 IP 地址就是要转发到的目标地址,也是就终于找到 IP 包头里的目标地址了,说明已抵达终点

知道对方的 IP 地址之后,接下来需要通过 ARP 协议根据 IP 地址查询 MAC 地址,并将查询的结果作为接收方 MAC 地址。

路由器也有 ARP 缓存,因此首先会在 ARP 缓存中查询,如果找不到则发送 ARP 查询请求。

接下来是发送方 MAC 地址字段,这里填写输出端口的 MAC 地址。还有一个以太类型字段,填写 0800 (十六进制)表示 IP 协议。

网络包完成后,接下来会将其转换成电信号并通过端口发送出去。这一步的工作过程和计算机也是相同的。

发送出去的网络包会通过交换机到达下一个路由器。由于接收方 MAC 地址就是下一个路由器的地址,所以交换机会根据这一地址将包传输到下一个路由器。

接下来,下一个路由器会将包转发给再下一个路由器,经过层层转发之后,网络包就到达了最终的目的地。

不知你发现了没有,在网络包传输的过程中,源 IP 和目标 IP 始终是不会变的,一直变化的是 MAC 地址,因为需要 MAC 地址在以太网内进行两个设备之间的包传输。

OSI(Open System Interconnection Reference Model)

为了使得多种设备能通过网络相互通信,和为了解决各种不同设备在网络互联中的兼容性问题,国际标准化组织制定了开放式系统互联通信参考模型(Open System Interconnection Reference Model),也就是 OSI 网络模型,该模型主要有 7 层,分别是应用层、表示层、会话层、传输层、网络层、数据链路层以及物理层。

每一层负责的职能都不同,如下:

  • 应用层,负责给应用程序提供统一的接口;
  • 表示层,负责把数据转换成兼容另一个系统能识别的格式;
  • 会话层,负责建立、管理和终止表示层实体之间的通信会话;
  • 传输层,负责端到端的数据传输;
  • 网络层,负责数据的路由、转发、分片;
  • 数据链路层,负责数据的封帧和差错检测,以及 MAC 寻址;
  • 物理层,负责在物理网络中传输数据帧;

由于 OSI 模型实在太复杂,提出的也只是概念理论上的分层,并没有提供具体的实现方案

事实上,我们比较常见,也比较实用的是四层模型,即 TCP/IP 网络模型,Linux 系统正是按照这套网络模型来实现网络协议栈的。

TCP/IP 网络模型共有 4 层,分别是应用层、传输层、网络层和网络接口层,每一层负责的职能如下:

  • 应用层,负责向用户提供一组应用程序,比如 HTTP、DNS、FTP 等;
  • 传输层,负责端到端的通信,比如 TCP、UDP 等;
  • 网络层,负责网络包的封装、分片、路由、转发,比如 IP、ICMP 等;
  • 网络接口层,负责网络包在物理网络中的传输,比如网络包的封帧、 MAC 寻址、差错检测,以及通过网卡传输网络帧等;

MAC 层报文

Linux Network Protocol Stack

img

以太网中,规定了最大传输单元(MTU)是 1500 字节,也就是规定了单次传输的最大 IP 包大小。

kubernetes

Introduction

Overview

Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.

  • When you deploy Kubernetes, you get a cluster.
  • A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications.
  • Every cluster has at least one worker node.

Components

This document outlines the various components you need to have for a complete and working Kubernetes cluster.

Control Plane components

kube-api server

The API server is the front end for the Kubernetes control plane.

The main implementation of a Kubernetes API server is kube-apiserver. kube-apiserver is designed to scale horizontally—that is, it scales by deploying more instances. You can run several instances of kube-apiserver and balance traffic between those instances.

Horizontal scaling vs Vertical scaling:

水平扩展

是通过增加或减少工作负载的实例数量来调整资源的过程。在 Kubernetes 中,这通常是通过增加或减少 Pod 的副本数量来实现的。例如,如果一个应用程序的负载增加,可以增加 Pod 的数量来分散负载。这通常是通过使用 Kubernetes 的 Horizontal Pod Autoscaler (HPA) 来自动完成的,HPA 根据 CPU 利用率或其他选择的度量标准来自动调整 Pod 的数量。
水平扩展通常适用于无状态应用程序,因为增加的实例可以独立处理请求,不需要访问共享资源。

垂直扩展

是通过增加或减少单个实例的资源(如 CPU 或内存)来调整资源的过程。在 Kubernetes 中,这通常是通过增加或减少 Pod 的资源限制和请求来实现的。

例如,如果一个应用程序的性能受到 CPU 或内存的限制,可以增加 Pod 的 CPU 或内存资源来提高性能。

垂直扩展通常适用于有状态应用程序,因为这些应用程序通常需要访问共享资源,而且可能不容易在多个实例之间分散负载。

etcd

Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data. If your Kubernetes cluster uses etcd as its backing store, make sure you have a back up plan for the data.

Kube-scheduler

Control plane component that watches for newly created Pods with no assigned node, and selects a node for them to run on. Factors taken into account for scheduling decisions include: individual and collective resource requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and deadlines.

Kube-controller-manager

Control plane component that runs controller processes. Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.

Some types of these controllers are:

  • Node controller: Responsible for noticing and responding when nodes go down.
  • Job controller: Watches for Job objects that represent one-off tasks, then creates Pods to run those tasks to completion.
  • EndpointSlice controller: Populates EndpointSlice objects (to provide a link between Services and Pods).
  • ServiceAccount controller: Create default ServiceAccounts for new namespaces.

Cloud-controller-manager

A Kubernetes control plane component that embeds cloud-specific control logic. The cloud controller manager lets you link your cluster into your cloud provider’s API, and separates out the components that interact with that cloud platform from components that only interact with your cluster.

Node Components

Kubelet

An agent that runs on each node in the cluster. It makes sure that containers are running in a Pod.

Kube-proxy

kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.

Container-runtime

The container runtime is the software that is responsible for running containers.

Addons

Addons use Kubernetes resources (DaemonSet, Deployment, etc) to implement cluster features. Because these are providing cluster-level features, namespaced resources for addons belong within the kube-system namespace.

DNS

While the other addons are not strictly required, all Kubernetes clusters should have cluster DNS, as many examples rely on it.

Web-UI(Dashboard)

Dashboard is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage and troubleshoot applications running in the cluster, as well as the cluster itself.

Container Resource Monitoring

Container Resource Monitoring records generic time-series metrics about containers in a central database, and provides a UI for browsing that data.

Cluster-level Logging

A cluster-level logging mechanism is responsible for saving container logs to a central log store with search/browsing interface.

Network Plugins

Network plugins are software components that implement the container network interface (CNI) specification. They are responsible for allocating IP addresses to pods and enabling them to communicate with each other within the cluster.

Elementary

0.基础

A cluster is made by nodes(master node and worker nodes), Master is used to managed the cluster. Worker节点是虚拟机或物理计算机,充当k8s集群中的工作计算机。 每个Worker节点都有一个Kubelet,它管理该Worker节点并负责与Master节点通信。该Worker节点还应具有用于处理容器操作的工具,例如Docker。

A node is made by pods

Pod is created by deployment, you can specify how many replicas you want. You can use deployment to manage these pods replicas.

A pod is made by containers.

1.部署一个应用程序

首先得创建一个yaml文件,里面包含了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: apps/v1	#与k8s集群版本有关,使用 kubectl api-versions 即可查看当前集群支持的版本
kind: Deployment #该配置的类型,我们使用的是 Deployment
metadata: #译名为元数据,即 Deployment 的一些基本属性和信息
name: nginx-deployment #Deployment 的名称
labels: #标签,可以灵活定位一个或多个资源,其中key和value均可自定义,可以定义多组,目前不需要理解
app: nginx #为该Deployment设置key为app,value为nginx的标签
spec: #这是关于该Deployment的描述,可以理解为你期待该Deployment在k8s中如何使用
replicas: 1 #使用该Deployment创建一个应用程序实例
selector: #标签选择器,与上面的标签共同作用,目前不需要理解
matchLabels: #选择包含标签app:nginx的资源
app: nginx
template: #这是选择或创建的Pod的模板
metadata: #Pod的元数据
labels: #Pod的标签,上面的selector即选择包含标签app:nginx的Pod
app: nginx
spec: #期望Pod实现的功能(即在pod中部署)
containers: #生成container,与docker中的container是同一种
- name: nginx #container的名称
image: nginx:1.7.9 #使用镜像nginx:1.7.9创建container,该container默认80端口可访问

然后

1
kubectl apply -f nginx-deployment.yaml

这样应该就能够看见了

1
2
3
4
5
6
# 查看 Deployment
kubectl get deployments

# 查看 Pod
kubectl get pods

2.查看pods/nodes

在部署完一个应用程序后,k8s创建了一个deployment,由它创建了一个pod来放置应用程序实例也就是container

在一个pod中,里面的容器containers会共享

  • 共享存储,称为卷(Volumes),即图上紫色圆柱
  • 网络,每个 Pod(容器组)在集群中有个唯一的 IP,pod(容器组)中的 container(容器)共享该IP地址
  • container(容器)的基本信息,例如容器的镜像版本,对外暴露的端口等

Pod(容器组)是 k8s 集群上的最基本的单元。当我们在 k8s 上创建 Deployment 时,会在集群上创建包含容器的 Pod (而不是直接创建容器)。每个Pod都与运行它的 worker 节点(Node)绑定,并保持在那里直到终止或被删除。如果节点(Node)发生故障,则会在群集中的其他可用节点(Node)上运行相同的 Pod(从同样的镜像创建 Container,使用同样的配置,IP 地址不同,Pod 名字不同)。

Node

![截屏2023-07-15 14.42.27](../images/截屏2023-07-15 14.42.27.png)

Pod总是在 Node 上运行。Node是 kubernetes 集群中的计算机,可以是虚拟机或物理机。每个 Node都由 master 管理。一个 Node可以有多个Pod,kubernetes master 会根据每个 Node上可用资源的情况,自动调度 Pod到最佳的 Node上。

每个 Kubernetes Node至少运行:

  • Kubelet,负责 master 节点和 worker 节点之间通信的进程;管理 Pod和 Pod内运行的 Containers.
  • 容器运行环境(如Docker)负责下载镜像、创建和运行容器等.

kubectl 还有如下四个常用命令,在我们排查问题时可以提供帮助:

kubectl get – 资源类型,

1
2
3
4
5
6
7
8
9
#获取类型为Deployment的资源列表
kubectl get deployments

#获取类型为Pod的资源列表
kubectl get pods

#获取类型为Node的资源列表
kubectl get nodes

名称空间

在命令后增加 -A--all-namespaces 可查看所有 名称空间中 的对象,使用参数 -n 可查看指定名称空间的对象,例如

1
2
3
4
5
6
# 查看所有名称空间的 Deployment
kubectl get deployments -A
kubectl get deployments --all-namespaces
# 查看 kube-system 名称空间的 Deployment
kubectl get deployments -n kube-system

kubectl describe - 显示有关资源的详细信息

1
2
3
4
5
6
7
8
# kubectl describe 资源类型 资源名称

#查看名称为nginx-XXXXXX的Pod的信息
kubectl describe pod nginx-XXXXXX

#查看名称为nginx的Deployment的信息
kubectl describe deployment nginx

Kubectl logs

1
2
3
4
5
6
# kubectl logs Pod名称

#查看名称为nginx-pod-XXXXXXX的Pod内的容器打印的日志
#本案例中的 nginx-pod 没有输出日志,所以您看到的结果是空的
kubectl logs -f nginx-pod-XXXXXXX

Kubectl exec

1
2
3
4
5
# kubectl exec Pod名称 操作命令

# 在名称为nginx-pod-xxxxxx的Pod中运行bash
kubectl exec -it nginx-pod-xxxxxx /bin/bash

TIP

Worker节点是k8s中的工作计算机,可能是VM或物理计算机,具体取决于群集。多个Pod可以在一个节点上运行。

3.公布应用程序

Kubernetes service

Pod有自己的生命周期。当worker node故障的时候,节点上运行的pod也会消失,然后deployment可以通过创建新的 Pod(容器组)来动态地将群集调整回原来的状态,以使应用程序保持运行。

举个例子,假设有一个图像处理后端程序,具有 3 个运行时副本。这 3 个副本是可以替换的(无状态应用),即使 Pod消失并被重新创建,或者副本数由 3 增加到 5,前端系统也无需关注后端副本的变化。由于 Kubernetes 集群中每个 Pod都有一个唯一的 IP 地址(即使是同一个 Node 上的不同 Pod),我们需要一种机制,为前端系统屏蔽后端系统的 Pod在销毁、创建过程中所带来的 IP 地址的变化。

kubernetes 的 service就提供这样的功能,它提供了这样的一个抽象层,它选择具备某些特征的 Pod并为它们定义一个访问方式。Service使Pod之间的相互依赖解耦(原本从一个 Pod 中访问另外一个 Pod,需要知道对方的 IP 地址)。一个 Service选定哪些Pod通常由LabelSelector来决定。

在创建Service的时候,通过设置配置文件中的 spec.type 字段的值,可以以不同方式向外部暴露应用程序:

  • ClusterIP(默认)

    在群集中的内部IP上公布服务,这种方式的 Service(服务)只在集群内部可以访问到

  • NodePort

    使用 NAT 在集群中每个的同一端口上公布服务。这种方式下,可以通过访问集群中任意节点+端口号的方式访问服务 <NodeIP>:<NodePort>。此时 ClusterIP 的访问方式仍然可用。

  • LoadBalancer

    在云环境中(需要云供应商可以支持)创建一个集群外部的负载均衡器,并为使用该负载均衡器的 IP 地址作为服务的访问地址。此时 ClusterIP 和 NodePort 的访问方式仍然可用。

TIPs

Service是一个抽象层,它通过 LabelSelector 选择了一组 Pod(容器组),把这些 Pod 的指定端口公布到到集群外部,并支持负载均衡和服务发现。

  • 公布 Pod 的端口以使其可访问
  • 在多个 Pod 间实现负载均衡
  • 使用 Label 和 LabelSelector

下图中有两个服务Service A(黄色虚线)Service B(蓝色虚线) Service A 将请求转发到 IP 为 10.10.10.1 的Pod上, Service B 将请求转发到 IP 为 10.10.10.2、10.10.10.3、10.10.10.4 的Pod上。

![截屏2023-07-15 16.17.04](../images/截屏2023-07-15 16.17.04.png)

Service 将外部请求路由到一组 Pod 中,它提供了一个抽象层,使得 Kubernetes 可以在不影响服务调用者的情况下,动态地调度pods(在容器组失效后重新创建容器组,增加或者减少同一个 Deployment 对应容器组的数量等).

Service使用 Labels、LabelSelector(标签和选择器) (opens new window)匹配一组 Pod。Labels(标签)是附加到 Kubernetes 对象的键值对,其用途有多种:

  • 将 Kubernetes 对象(Node、Deployment、Pod、Service等)指派用于开发环境、测试环境或生产环境
  • 嵌入版本标签,使用标签区别不同应用软件版本
  • 使用标签对 Kubernetes 对象进行分类

下图体现了 Labels(标签)和 LabelSelector(标签选择器)之间的关联关系

  • Deployment B 含有 LabelSelector 为 app=B 通过此方式声明含有 app=B 标签的 Pod 与之关联

  • 通过 Deployment B 创建的 Pod 包含标签为 app=B

  • Service B 通过标签选择器 app=B 选择可以路由的 Pod

    ![截屏2023-07-15 16.24.25](../images/截屏2023-07-15 16.24.25.png)

Labels可以在创建 Kubernetes 对象时附加上去,也可以在创建之后再附加上去。任何时候都可以修改一个 Kubernetes 对象的 Labels.

实战:

1
2
3
4
metadata:	#译名为元数据,即Deployment的一些基本属性和信息
name: nginx-deployment #Deployment的名称
labels: #标签,可以灵活定位一个或多个资源,其中key和value均可自定义,可以定义多组
app: nginx #为该Deployment设置key为app,value为nginx的标签
1
vim nginx-service.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: v1
kind: Service
metadata:
name: nginx-service #Service 的名称
labels: #Service 自己的标签
app: nginx #为该 Service 设置 key 为 app,value 为 nginx 的标签
spec: #这是关于该 Service 的定义,描述了 Service 如何选择 Pod,如何被访问
selector: #标签选择器
app: nginx #选择包含标签 app:nginx 的 Pod
ports:
- name: nginx-port #端口的名字
protocol: TCP #协议类型 TCP/UDP
port: 80 #集群内的其他容器组可通过 80 端口访问 Service
nodePort: 32600 #通过任意节点的 32600 端口访问 Service
targetPort: 80 #将请求转发到匹配 Pod 的 80 端口
type: NodePort #Serive的类型,ClusterIP/NodePort/LoaderBalancer

1
2
# 执行命令
kubectl apply -f nginx-service.yaml
1
2
3
4
# 检查执行结果
kubectl get services -o wide

# 可查看到名称为 nginx-service 的服务。
1
2
# 访问服务
curl <任意节点的 IP>:32600

4. Scaling

Sacling应用程序

在之前的文章中,我们创建了一个 Deployment ,然后通过 service提供访问 Pod 的方式。我们发布的 Deployment 只创建了一个 Pod 来运行我们的应用程序。当流量增加时,我们需要对应用程序进行伸缩操作以满足系统性能需求。

Scaling 的实现可以通过更改 nginx-deployment.yaml 文件中部署的 replicas(副本数)来完成

1
2
spec:
replicas: 2 #使用该Deployment创建两个应用程序实例

![截屏2023-07-16 00.01.42](../images/截屏2023-07-16 00.01.42.png)

修改了 Deployment 的 replicas 为 4 后,Kubernetes 又为该 Deployment 创建了 3 新的 Pod,这 4 个 Pod 有相同的标签。因此Service A通过标签选择器与新的 Pod建立了对应关系,将访问流量通过负载均衡在 4 个 Pod 之间进行转发。

![截屏2023-07-16 00.02.26](../images/截屏2023-07-16 00.02.26.png)

5.Rolling update

用户期望应用程序始终可用,为此开发者/运维者在更新应用程序时要分多次完成。在 Kubernetes 中,这是通过 Rolling Update 滚动更新完成的。Rolling Update滚动更新 通过使用新版本的 Pod 逐步替代旧版本的 Pod 来实现 Deployment 的更新,从而实现零停机。新的 Pod 将在具有可用资源的 Node(节点)上进行调度。

Kubernetes 更新多副本的 Deployment 的版本时,会逐步的创建新版本的 Pod,逐步的停止旧版本的 Pod,以便使应用一直处于可用状态。这个过程中,Service 能够监视 Pod 的状态,将流量始终转发到可用的 Pod 上。

Intermediate

Advanced

project_notes

Master Dissertation

week1

Microservice, Docker Explanation:

Service Grid and Microservice Communication: The Service Grid is an infrastructure layer that handles service-to-service communication. As the number of microservices increases, so does the complexity of communication between services. Service Grids such as Istio and Linkerd help solve this problem, but there is still much room for research and exploration, including communication efficiency, security, observability, and control.

Container orchestration and scheduling: With the growing popularity of containerized applications, how to effectively manage and schedule these containers has become an important issue. Although Kubernetes has become the de facto standard, further research is needed on how to optimize scheduling strategies, how to save resources while ensuring performance, and how to perform container scheduling in multi-cloud environments.

Performance and Scalability of Microservices: With the widespread adoption of microservice architectures, how to improve the performance and scalability of microservices is an important research issue. This includes how to design an efficient microservice architecture, how to optimize the performance of microservices, and how to effectively scale up and down microservices.

Security of containers and microservices: Due to the dynamic and decentralized nature of containers and microservices, their security becomes an important research issue. This includes how to ensure the isolation of containers, how to protect the communication between microservices, and how to prevent malicious code from running in container and microservice environments, etc.

Testing and verification of microservices: The independence and dynamic nature of microservices brings new challenges for testing and verification. How to conduct effective microservice testing, how to verify the correctness of microservices, and how to perform continuous integration and continuous deployment of microservices are all issues that need to be studied.

Fault detection and recovery: In a distributed environment, fault detection and recovery is a very important issue. This includes how to quickly detect faults, how to accurately locate the cause of faults, and how to automatically restore services.

How and why decoupling?

Decoupling and transforming a project into a microservice architecture while containerizing it using Docker can improve system scalability, maintainability, and disaster recovery. The following are some basic steps:

  1. Determine microservice boundaries: The first step is to determine which microservices to decompose the large monolithic application into. This process requires understanding the business logic and identifying modules that can run independently and scale independently. A good microservice should be business-driven and can perform a specific business function independently.
  1. Create inter-service communication mechanisms: In a microservice architecture, services need to communicate with each other over the network. You can choose to use REST, gRPC, or message queues for communication.
  1. Designing data persistence: Each microservice should have its own independent database to reduce the coupling between services. But this also needs to address the issue of data consistency.
  1. Build microservices: For each microservice, you can choose the programming language and framework that best suits the business logic of that service for development.
  1. Create Docker containers: Pack each microservice into a Docker container. Each container contains all the dependencies and environments needed to run a microservice, which ensures that the microservice will run stably in any environment.
  1. Orchestrate with Docker Compose or Kubernetes: These tools can help manage and deploy your Docker containers, as well as handle service discovery, load balancing, fault recovery, networking, and security.
  1. Implement Continuous Integration and Continuous Deployment (CI/CD): Automating builds, tests, and deployments can greatly improve development speed and software quality.

The main benefits of doing so are as follows:

  1. Scalability: When the system load increases, services with higher demand can be scaled individually without the need to scale as a whole.
  1. Maintainability: Each service runs independently, with a small code base that is easier to understand and modify.
  1. Fault isolation: If something goes wrong with one service, it will not affect other services, reducing the risk of system failure.
  1. Fast iteration: Each microservice can be deployed independently, making it faster to develop and bring new features online.
  1. Technology diversity: Each microservice can choose the technology stack that best suits its business needs.

Benefits from containerization: Using Docker containers simplifies the deployment process, ensures software consistency across environments, and provides better.

Fog computing:

Fog computing is a computing architecture in which a series of nodes receives data from IoT devices in real time. These nodes perform real-time processing of the data that they receive, with millisecond response time. The nodes periodically send analytical summary information to the cloud

Projects ideas:

Docker and distributed architecture application in cloud computing platform.

Decouple a project into microservices architecture (Monometer -> microservice, use brownfield projects).

Make sure every service we have is high cohesion, and low coupling.

Avoid underserving. The system says no more than four calls. This ensures a moderate level of service. In addition, it is important to think about the team in terms of making the team light enough to decouple microservices from each other. From the above aspects, to avoid the problem of microservices too small.

Some open source manometer projects: BroadleafCommerce, https://github.com/jhipster/generator-jhipster

Decompose by business capability model/Decompose by subdomain pattern (DDD).

week2

Paper review

Summary and baselines

Will_edge_autoscaling

ML-based scaling management for kube

PBScaler: A Bottleneck-aware Autoscaling Framework for Microservice-based Applications

Microscaler: Automatic Scaling for Microservices with an Online Learning Approach”

Autopilot: workload autoscaling at Google

Adaptive scaling of Kubernetes pods

Proactive Autoscaling for Edge Computing Systems with Kubernetes

SHOWAR: Right-Sizing And Efficient Scheduling of Microservices

Relative project

https://github.com/jthomperoo/custom-pod-autoscaler

Custom Pod Autoscaler

HPA cons: hard coded algorithm, CPA provides you flexibility in my scaling.

Week3

Autopilot/ LOTUS

CPA

Week4

PPA/PBScaler/LOTUS/Autopilot

Let’s start with PPA

摘要

随着物联网和5G技术的出现,边缘计算范式以更好的可用性、延迟控制和性能发挥着越来越重要的作用。然而,现有的边缘计算应用程序的自动缩放工具不能有效地利用边缘系统的异构资源,从而为性能改进留下了空间。在这项工作中,我们为Kubernetes上的边缘计算应用程序提出了一个主动Pod Autoscaler (PPA)。提议的PPA能够使用多个用户定义/自定义指标提前预测工作负载,并相应地扩展和缩小边缘计算应用程序。在一个cpu密集型边缘计算应用实例中,进一步对PPA进行了优化和评估。可以得出结论,所提出的PPA在资源利用效率和应用程序性能方面都优于Kubernetes默认的pod自动缩放器。文章还强调了拟议购电协议未来可能的改进。

在过去的几十年里,云计算应用在各种广泛的领域中越来越受欢迎[4,18]。云计算供应商为用户提供按需可用的大量资源,包括计算单元、存储、网络设备,甚至服务和应用程序。云计算已经被证明在商业和技术方面都是成功的,并且这种方法为许多流行的应用程序(例如Netflix, DropBox, Spotify)提供了动力[13,15]。然而,云计算存在应用程序延迟的问题,这主要受限于客户端与数据中心之间的地理距离和网络带宽[12]。因此,云计算不能完全满足对延迟敏感的应用程序(例如视频游戏流、实时数据分析)的需求。

边缘计算是解决传统云计算延迟问题的一种很有前途的方法[14],它将对延迟敏感的计算和数据存储移动到靠近网络边缘的客户端位置。边缘计算具有较低的延迟、带宽占用和开销,在实现智慧城市、智能电网、智能交通等方面发挥着重要作用[11]。

在现实场景中,云计算和边缘计算应用程序的自动伸缩是必不可少的[3,20]。随着工作负载的变化,自动伸缩工具会动态调整资源量,以保持每个计算/存储单元的平均工作负载稳定。自动缩放为应用程序提供了更好的容错性、高可用性、高效的能耗和成本管理。

对于云计算,HPA作为Kubernetes提供的本地服务被广泛使用,Kubernetes是大多数云平台上事实上的云框架[8]。HPA以CPU利用率为衡量工作负载的指标,使用式1所示的算法以响应式方式扩展云应用程序。

![截屏2023-07-04 18.41.15](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-04 18.41.15.png)

然而,对于边缘计算,自动缩放是一个更复杂的问题[19]。尽管HPA在许多云计算案例中简单有效,但它并不完全能够自动扩展边缘计算应用,原因有三个:

HPA不是专门为边缘计算环境设计的,不知道异构边缘资源的约束和能力;

HPA仅考虑CPU利用率来估计工作负载。然而,对于边缘计算应用程序,在扩展应用程序时,有关系统的其他信息(例如作业队列、I/O、请求速率)也是必不可少的;

HPA是基于规则的,不可编程的,服务提供商很难定制HPA来满足他们对边缘计算应用的特定需求

一种很有前途的方法是为边缘计算应用程序开发一个主动的自动缩放器,该自动缩放器支持多个度量和可定制的算法。然而,开发这样的自动缩放器是具有挑战性的。从理论上讲,由于边缘资源的异质性和局限性,预测边缘系统的工作负载和扩展pod形成了一个时间序列分析和多目标优化的混合问题。实际上,从不同的来源收集多个度量标准并以统一的方式组织依赖于各种库的自定义算法是一个复杂的多框架工程问题。在这项工作中,我们提出了一个多指标支持和可定制的主动pod自动缩放器。自动扩展器能够收集多个指标,预测工作负载并提前扩展目标应用程序。此外,它允许用户自定义他们自己的扩展策略和预测模型,以更好地适应他们的应用程序。这项工作的主要贡献包括:

(1)为自动缩放边缘计算应用引入主动工作流。

(2)在Kubernetes上实现多指标支持和可定制的自动缩放器。

(3)提供一个示例应用程序,以演示所建议的自动缩放器的性能。

本文的其余部分组织如下。相关文献在第2节中进行了回顾。在第3节中,说明了本工作中边缘系统环境的设计。在第4节中,详细解释了所提出的自动缩放器的体系结构和算法。第5节和第6节介绍了所建议的自动缩放器的实验设计和结果。第7节总结了工作,并强调了进一步可能的改进。

考虑到HPA作为一个基准,许多研究工作已经开始探索HPA的替代方案,以用于托管在容器上的云/边缘应用程序。在本节中,将回顾有关被动和主动自动缩放技术的相关工作。此外,我们提出的主动Pod自动缩放器的必要性和其独特的功能进行了解释。

2.1

2019年,Fan等人报道了一种容器系统架构,提供了更高的自动缩放效率[22]。2020年,Salman和Marko的一篇文章探讨了除了CPU利用率之外应该考虑的其他关键因素[17]。可以肯定的是,为了更好地自动扩展云系统,除了CPU利用率之外,还需要多个指标。无功自缩放器的一个主要缺点是控制延迟。虽然一旦工作负载发生变化,容器就会被扩展,但是容器的初始化或终止需要时间。

2.2

为了改进响应式pod自动扩展器,它有望预测工作负载并提前做出扩展决策,即主动/预测自动扩展。2016年,Yang等人提出了一种基于时间序列分析预测容器CPU使用情况的主动自缩放算法CRUPA[9]。另一个例子来自Tian等人,他们在2017年报告了一个预测自动缩放框架[21]。为了提供更好的服务质量,提出了一种结合CPU利用率预测和服务水平协议的混合自扩展策略。机器学习模型在时间序列分析中得到了很多关注[16],一些报道的主动自动标度器是基于机器学习模型的。2020年,Mahmoud, Imtiaz和Mohammad提出了一种基于机器学习的主动自动缩放器,它利用LSTM和多个指标来预测工作负载[6]。虽然有一些关于云系统的主动自动缩放的工作,但边缘计算系统的主动自动缩放很少。据我们所知,只有一篇文章报道了边缘系统的主动自动缩放[1]。建立了预训练的神经网络模型,并将其作为运行系统中CPU利用率的预测模型。预测的CPU指标用于估计副本的数量。表1比较了相关工作和我们提出的方法。我们提出的主动Pod自动缩放器(PPA)具有以下特点:

•考虑到异构资源的限制和约束,这是少数关注边缘计算应用程序自动缩放的研究之一。

•支持多个指标(CPU, RAM, I/O利用率)和自定义指标来自动扩展应用程序,为工作负载估计和预测提供替代指标。尽管CPU利用率在许多情况下能够预测工作负载,但一些应用程序需要多个或特定于应用程序的指标来做出扩展决策。

•PPA灵活且与模型无关。与其他具有固定预测模型的主动pod自动缩放器不同,拟议的PPA支持自定义模型和多个模型框架(例如TensorFlow, statmodels等)。用户可以将自己的模型注入到PPA中,以获得适合自己应用的最佳性能。

•购电协议可以考虑信心因素。如果注入的模型是贝叶斯模型,对每个预测产生置信度/不确定性估计,那么所提出的自动标度器只有在预测置信度超过预设置信度阈值时才会主动工作,否则它会被动工作

现在详细介绍了所考虑的云和边缘环境的设置。

3.1 边缘计算的环境

![截屏2023-07-04 19.18.26](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-04 19.18.26.png)

![截屏2023-07-04 19.19.38](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-04 19.19.38.png)

图1显示了我们为Kubernetes连接的云/边缘计算环境编排的拓扑结构。表2介绍了系统中节点分配的资源。不同节点分配的资源有限且大小不一,而云节点拥有更强的计算能力和更大的内存资源。该系统描绘了一个典型边缘计算环境的真实世界模型。

![截屏2023-07-04 19.20.41](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-04 19.20.41.png)

3.2 集群组件

除了Kubernetes核心之外,集群还包含三个用于系统监控的逻辑组件,包括导出器、Prometheus堆栈和自动缩放器。集群的监控系统对于成功运行所建议的云/边缘计算框架起着至关重要的作用。该系统基于Prometheus[5],这是云计算和边缘计算环境中最流行的框架之一。不同类型的导出器负责生成不同的度量标准。在考虑的边缘计算环境中,节点导出器部署在每个节点上用于节点级指标,并为用户部署可自定义导出器以生成/提取用户自定义指标(例如请求率,排队任务数等)作为自定义指标。普罗米修斯堆栈由三部分组成——普罗米修斯、Grafana和普罗米修斯适配器。指标由Prometheus从出口商收集,并使用Grafana进行可视化。收集到的指标也由Prometheus Adapter在标准API中公开,以便其他pod能够以统一的方式获取它们。从Prometheus Adapter,自动扩展器获取所有类型的所需指标,评估所需副本的数量,并向Kubernetes主控制面板发出扩展决策请求。Kubernetes负责处理扩展请求和在节点上调度新的pod。

在本节中,介绍了所提出的主动Pod自动缩放器的体系结构和算法。首先阐述了PPA的结构和工作流程。然后,对PPA各组成部分的详细算法进行了推导和说明。

4.1 架构

![截屏2023-07-04 19.32.26](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-04 19.32.26.png)

图2显示了提议的主动pod自动缩放器的结构。PPA由三个组件(Formulator、Evaluator和Updater)组成,在两个循环中工作(控制循环和模型更新循环),并维护两个文件(度量历史文件和模型文件)。PPA的初始化需要将预训练的种子模型作为初始模型文件注入框架中。

在每个控制循环中,PPA负责使用收集到的指标扩展目标pod。Formulator从Prometheus Adapter获取原始指标,并对其进行预处理。制定的度量存储在度量历史文件中,并传递给评估者。然后,Evaluator从model file中加载模型,预测所需副本的数量,并向Kubernetes控制面板发出缩放请求。

在每个模型更新循环中,Updater从度量历史文件中加载训练数据,更新模型,删除度量历史文件,并将模型重新保存到模型文件中。有了Updater,用于预测的模型就可以针对新的工作负载模式不断更新。

4.2算法

4.2.1 评估器evaluator

算法1描述了评估器的工作原理。对于每个PPA,必须将一个关键指标设置为工作负载的估计器,注入的预测模型使用收集的指标来预测关键指标。为了从预测的关键指标中获得所需的副本数量,应该定义一个静态策略。在本工作中,使用第1节中描述的基于阈值的HPA算法作为默认的静态策略。但是,静态策略是可定制的,用户可以在PPA中注入自己的策略。

![截屏2023-07-04 19.41.37](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-04 19.41.37.png)

model.isBayesian() 是检查模型是否为贝叶斯模型的条件。贝叶斯模型是一种统计模型,它使用贝叶斯定理进行推断和预测。在这段代码中,如果模型是贝叶斯模型并且预测的置信度低于阈值,就会选择使用当前的关键度量指标而不是模型预测的结果。

1
2
3
4
5
6
7
8
9
10
11
12
Result: Number of replicas to be requested

Get current_metrics, current_replicas;

Set max_relicas of system;

if bottleneck_check == False then
num_replicas = 1
else if current_replicas < max_relicas then
num_replicas = GA(Genetic_algorithm_init_parameters, current_metrics, current_replicas)
else
num_replicas = max_relicas

4.2.2 模型的协议

所有满足以下协议的时间序列模型都可以注入进行指标预测:

(1)时间窗口大小:所有模型的时间窗口大小固定为1个单位,表明模型使用上一个回路的指标预测下一个控制回路的工作量。这是由新pod的初始化时间成本决定的,它通常需要不到一个时间间隔的控制循环。

(2)输入输出指标:模型的指标应按以下顺序输入[CPU, RAM,网络输入,网络输出,自定义指标]。该模型应该预测所有输入指标,但只有一个被设置为关键指标。

4.2.3升级器的模型更新策略Model Update Policy of the Updater

虽然Updater的工作流程是固定的,但我们提出了3种不同的策略来更新种子模型(称为模型更新策略):

(1)不重新训练模型:在整个执行过程中使用注入的种子模型而不更新。如果种子模型在稳定的工作负载下能够产生令人满意的结果,则无需定期更新模型,成本较大;

(2)从头开始重新训练模型:在每个模型更新循环中,Updater丢弃旧模型并从头开始训练具有与种子模型相同架构的新模型。在不同日子的工作负载变化很大的情况下,基于旧数据的模型不一定适合未来几天的工作负载模式;

(3)更新模型:使用上一个模型更新循环的数据对旧模型进行多几个epoch的再训练。在许多情况下,模式的工作负载确实会发生变化,但变化不大,并且可以将旧模型用作更新过程的起点。

这里我们提供了三个选项,对于不同的应用程序和不同的工作负载模式,最佳策略可能不同。

五. 实验

实验的目的是优化所提出的主动Pod自动缩放器的示例应用,并与HPA进行比较,验证其性能。在本节中,首先介绍示例应用程序和工作负载的生成。然后详细介绍了PPA优化和评价的实验细节。

5.1 示例应用

![截屏2023-07-06 12.11.29](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 12.11.29.png)

5.1.1 拓扑图

图3显示了托管在边缘计算集群上的示例应用程序的体系结构。示例应用程序分布在一个云和两个边缘区域中。在每个区域,都部署一些支持性的静态pod和服务,它们是固定的,不是为可扩展而设计的。只有每个区域中的worker pods才是自动缩放器要缩放的目标。

5.1.2 工作流和作业

在示例应用程序中,定义了两种不同类型的任务。第一种是对长度为3000的随机数组进行排序(称为sort task),复杂度为𝑛log𝑛;第二种是计算维数为1000 × 1000的随机矩阵的特征值(称为eigen task),复杂度为𝑛3。所有请求都是从设备生成的,它们到达最靠近其位置的边缘的入口点。每个请求要么是排序任务,要么是特征任务。具有排序任务的任务计算成本不高,由边缘工作者处理,而具有特征任务的任务由于计算成本高而转发给云工作者。本工作中的示例应用程序是模拟典型的cpu密集型应用程序,这些应用程序在一般和科学计算中都很常见(例如天气预报、搜索算法、音频/视频处理等)。基于示例应用的结论具有通用性。

5.2 工作负载生成

本工作中考虑了两种不同的工作负载,即随机存取和美国国家航空航天局(NASA)数据集。它们在下文中有描述。

5.2.1随机访问

如图算法2所示,Random Access的设计是为了生成随机工作负载进行优化实验。Sort和Eigen任务分别以0.9和0.1的概率生成,以模拟大多数成本低的请求在边缘处理,而成本高的任务在云中完成。通过定期使用3种不同的工作负载模式(即轻、中、重)访问应用程序,自动缩放器有望覆盖实际使用中可能出现的大多数情况。![截屏2023-07-06 12.25.27](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 12.25.27.png)

5.2.2 NASA数据集

NASA数据集用于模拟真实世界的访问[10],用于评估实验。所有的访问记录都是由NASA肯尼迪航天中心的WWW服务器收集的,每个日志都包含其访问时间戳。通过累积每分钟的访问请求来预处理原始日志文件,并使用聚合的日志数量向应用程序发出请求。包含2个月记录的数据集的所有请求的综合实验集将非常耗时。

在这项工作中,从数据集中选择2天的子集进行实验。同时,将请求数调整到适当的规模,使峰值工作负载不超过实验边缘环境的资源限制。

5.3 Experiments for optimization

对PPA的三个超参数进行了优化,包括工作量预测模型、更新策略和关键指标。优化问题如式2所示。其中,𝛾为应用程序的响应时间,𝑊为浪费资源的总和,𝑀为预测模型,𝑈为模型更新策略,𝐾为关键指标Key metric,A为目标应用, Rp是分配给pod p的资源, 𝑃𝑛是在节点𝑛上托管的所有pod的集合, 𝑁是所有节点的集合。在这里,目标应用程序A是固定的,并且要优化𝑀、𝑈、𝐾以最小化𝛾和𝑊。

![截屏2023-07-06 13.19.08](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 13.19.08.png)

5.3.1 预测模型优化

针对实例应用,对不同的指标预测模型进行了比较和优化,并以自回归移动平均(ARMA)模型和长短期记忆(LSTM)模型为典型模型。

实验采用ARMA(1,1,1),表示其移动平均部分阶数和自回归部分阶数均设为1,模型时间窗为1。ARMA模型的超参数是根据收集到的数据预先选择的。

我们使用的LSTM模型由一个50个神经元的LSTM层和一个由ReLu函数激活的全连接层[2]组成。输出层的形状被设置为5,以适应所有未来的指标。LSTM模型的损失函数为均方误差(Mean Squared Error, MSE),优化器为Adam optimizer[7]。

这两个模型都使用在单个不受约束的节点上运行示例应用程序10个小时并使用Random Access生成的工作负载收集的数据进行预训练。使用这两个模型对示例应用程序进行了200分钟的自动伸缩,并比较了CPU利用率的预测值和实际值。

5.3.2 优化Update策略

如4.2.3节所介绍的,提出并比较了3种不同的Updater方法。在本实验中,预训练的LSTM模型作为种子模型,并将CPU利用率定义为关键指标。为了缩短实验所需的时间,将模型更新循环的时间间隔设置为1小时。示例应用程序运行200分钟,每个PPA使用不同的更新策略自动缩放,并收集CPU利用率的预测值和实际值。

5.3.3 关键指标的优化

在本实验中,将所有pod的请求率或CPU利用率之和作为关键指标进行比较。同样,示例应用程序运行200分钟,每个PPA使用Random Access生成的工作负载自动缩放。由于关键指标的差异,不可能将两个ppa与预测指标进行定量比较。相反,将所有请求的响应时间和由两个ppa自动缩放的系统空闲资源进行比较,以量化两个ppa的性能。

5.4 评价实验

利用最优超参数对优化后的PPA进行真实场景评价。PPA通过最佳配置自动缩放,应用程序可以使用缩放后的NASA数据集的工作负载运行48小时。通过比较请求的响应时间和空闲资源,量化性能。应用程序的另一次运行是使用完全相同的配置进行的,这些配置由Horizontal Pod Autoscaler自动缩放,作为PPA的基线.

6结果与讨论

结果将在下文中提出和讨论。首先给出超参数优化结果,然后在已部署的场景中进行评估。

6.1预测模型优化

两种不同模型的PPA结果如图4所示。观察结果表明,两个模型都能够捕捉到CPU利用率的趋势,而ARMA模型在拟合方面稍好一些。但从数量上看,LSTM模型预测的MSE为53240.972,ARMA模型预测的MSE为96867.631,后者要大得多。这表明,在预测CPU利用率时,ARMA模型提供了显著的变化,而LSTM模型能够产生相对更准确的预测。可以得出结论,LSTM在预测示例应用程序的CPU利用率方面优于ARMA模型。

![截屏2023-07-06 13.49.25](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 13.49.25.png)

6.2优化升级策略

图5对具有不同策略的三个ppa进行了比较。采用策略1、策略2和策略3的PPA产生的利用率预测MSE分别为64769.882、42180.437和30994.449,表明策略3在提出的模型更新策略中表现最好。我们得出的结论是,策略3在每个模型更新循环中使用新收集的指标重新训练模型,为示例应用程序提供了优于其他策略的最佳性能。

![截屏2023-07-06 13.59.09](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 13.59.09.png)

6.3关键指标的优化

在示例应用程序上比较PPA的两个关键指标,即请求率和CPU利用率。请求的响应时间和空闲资源用于定量地比较ppa的性能。图6显示了使用不同关键指标自动缩放的应用程序上请求的响应时间分布。两个分布的巨大重叠表明两个运行的响应时间非常接近。根据CPU利用率自动缩放的应用程序的平均响应时间为0.5156秒,标准偏差(STD)为0.0421,而根据请求率自动缩放的应用程序的平均响应时间为0.5157秒,标准偏差为0.420。由于时间分布没有显著差异,因此可以得出结论,在响应时间方面自动扩展示例应用程序时,两个关键指标是等效的。(这个图里面的黄线和蓝线估计是基本上重叠了)

image-20230706141151279

系统在𝑡时刻的相对空闲资源(RIR_t)(定义为公式3)用于定量地比较两个自动缩放器。具有两个关键指标的ppa的rir如图7所示。CPU请求率的平均RIR为0.317,标准差为0.161;CPU利用率的平均RIR为0.251,标准差为0.092。可以得出结论,以CPU利用率为关键指标的PPA效率更高,浪费的资源更少。此外,RIR的标准偏差较低表明,以CPU利用率为关键指标进行自动缩放时,系统更加稳定。

![截屏2023-07-06 14.17.32](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 14.17.32.png)

尽管具有不同关键指标的两个PPA为请求提供接近的响应时间,但是具有CPU利用率的PPA更节能,系统也更稳定。因此,CPU利用率是示例应用程序的最佳关键指标。

6.4实验评价结果

根据最优模型、关键指标和更新策略,对拟议的PPA进行5.4节所述的评估,并将PPA和HPA的性能与请求响应时间和空闲资源进行定量比较。

6.4.1响应时间。

图8显示了由PPA和HPA自动缩放的Sort任务的响应时间分布。HPA自动缩放的应用程序对边缘任务的平均响应时间为0.592秒,标准偏差(STD)为0.067;PPA自动缩放的应用程序的平均响应时间为0.508秒,标准偏差(STD)为0.038。PPA提供的响应时间明显小于HPA, p值小于10−3。此外,PPA提供的分布具有较小的标准差。可以看出,通过PPA自动伸缩的应用程序提供的边缘服务延迟更小,边缘系统更稳定。在特征任务中也观察到类似的结果。HPA和PPA提供的云任务的两个响应时间分布如图9所示。HPA自动缩放的应用程序对边缘任务的平均响应时间为13.382秒,标准差为1.606;PPA自动缩放的应用程序的平均响应时间为13.646秒,标准差为1.576。HPA提供的响应时间显著大于PPA,且p值小于10−3,对于Sort任务的标准差也是如此。因此,与HPA相比,PPA自动扩展的云服务具有更好的性能。

因此,可以得出结论,在边缘计算应用中,所提出的主动Pod Autoscaler在云服务和边缘服务方面都优于默认HPA,具有更低的延迟和更高的访问稳定性。

![截屏2023-07-06 14.22.45](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 14.22.45.png)

![截屏2023-07-06 14.23.25](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 14.23.25.png)

6.4.2空闲资源

图10显示了由PPA和HPA自动缩放的边缘工作节点的相对空闲CPU使用情况。HPA自动缩放的边缘工作线程平均相对空闲CPU为0.3209,标准差为0.1079;PPA自动缩放的边缘工作线程平均相对空闲CPU为0.2988,STD为0.1026。虽然两种自动缩放器的结果在视觉上是相似的,但PPA提供的相对空闲CPU明显小于PPA, p值小于10−3。因此,可以得出结论,PPA自动伸缩的应用程序的边缘工作节点比HPA自动伸缩的应用程序更有效地利用CPU资源。HPA和PPA自动伸缩的云工作节点的相对空闲CPU使用情况如图9所示。HPA自动伸缩的云工作者的平均相对空闲CPU为0.3373,标准差为0.1572;PPA自动伸缩的应用程序的平均相对空闲CPU为0.3098,STD为0.1453。HPA提供的相对空闲CPU明显大于PPA, p值小于10−3。因此,在云节点上,PPA自动伸缩的应用程序比HPA自动伸缩的应用程序浪费的CPU资源更少。综上所述,在请求响应时间和空闲资源方面,本文提出的Proactive Pod Autoscaler都优于默认的Horizontal Pod Autoscaler,为边缘计算应用带来更好的性能,获得更多的访问稳定性,并且浪费更少的资源。

![截屏2023-07-06 14.33.27](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-06 14.33.27.png)

7结论与未来工作

在这项工作中,我们提出了一种基于Kubernetes的边缘计算应用的主动pod自动缩放器(PPA)。PPA能够使用时间序列预测方法提前预测应用程序的到达工作负载,并根据需要伸缩应用程序。以cpu密集型应用程序为例,针对Kubernetes中的内置自动缩放器HPA对自动缩放器进行了优化和评估。可以得出结论,PPA能够更有效地利用资源,并为请求提供更快的响应时间。此外,所提出的自动缩放器具有高度的灵活性和可定制性,因此用户可以调整指标、预测模型和缩放策略,以更好地满足自己应用程序的需求。除了支持多个指标外,用户还可以定义自己的特定于应用程序的指标,以更好地扩展应用程序。这使得ppa能够适应各种类型的应用程序,例如实际使用中的数据密集型应用程序、IO密集型应用程序。拟议的PPA并不完美,因为它需要开发人员手动优化关键指标、模型和策略。在未来的工作中,可以通过自动化超参数优化来改进PPA。一种可能的方法可能涉及使用一组可能的指标运行应用程序,PPA的指定模块使用不同的方法自动建模收集的运行数据。然后可以使用验证技术从候选模型中选择最佳模型。通过这种方式,自动优化的PPA将更容易被开发人员部署和使用。

PBScaler:一个基于微服务的应用程序的瓶颈感知自动伸缩框架

Introduction

随着微服务架构的推进,越来越多的云应用从单片架构向微服务架构迁移[1]、[2]、[3]、[4]、[5]、[6]。这种新架构通过将一个单片应用程序分解为多个微服务,通过HTTP或RPC协议相互通信来减少应用程序耦合[7]。此外,每个微服务都可以由独立的团队独立开发、部署和扩展,从而实现快速的应用程序开发和迭代。然而,外部工作负载的不可预测性和微服务之间交互的复杂性会导致性能下降[8],[9],[10]。云提供商必须准备过多的资源来满足应用程序所有者的服务水平目标(service level objective, SLO),这通常会造成不必要的资源浪费[11],[12]。因此,满足SLO和最小化资源消耗之间的不平衡成为微服务中资源管理遇到的主要挑战。

微服务自动伸缩指的是根据工作负载变化弹性分配资源的能力[13]。通过利用微服务的弹性特性,自动扩展可以缓解资源成本和性能之间的冲突。然而,微服务的自动伸缩难以在短时间内准确地伸缩性能瓶颈(PB)。由于微服务之间通信的复杂性,PB的性能下降可能会通过消息传递传播给其他微服务[2],导致同时出现大量异常微服务。我们通过在Google开发的开源微服务应用Online Boutique 1中向特定的微服务注入突发工作负载来证明这一点。图1显示,PB推荐中的性能下降会蔓延到上游微服务,如Checkout和Frontend。为了进一步验证准确扩展PB的重要性,我们进行了压力测试,并分别扩展了不同的微服务。如图2所示,微服务(前端)扩容异常并不能缓解SLO违规。然而,当我们确定并扩展PB推荐时,微服务应用程序的性能得到了改善。不幸的是,定位PBs通常很耗时,偶尔也会出错[14]。

近年来,已经提出了几种方法来在自动扩展之前识别关键的微服务。例如,Kubernetes 2的默认自动缩放器根据计算资源的静态阈值过滤微服务进行直接缩放。Yu等[15]通过计算服务功率来定义弹性缩放的边界,服务功率是第50百分位响应时间(P50)与第90百分位响应时间(P90)之比。此外,Qiu等人[4]引入了一种基于支持向量机的方法,通过分析各种尾部延迟的比率来提取关键路径。尽管这些研究缩小了自动扩展的范围,但它们仍然考虑了可能影响扩展策略的非瓶颈微服务,特别是当应用程序中大量微服务同时异常时。因此,在自动扩展之前,迫切需要精确地定位瓶颈微服务。为了平衡资源消耗和性能,现有的工作采用在线优化算法来寻找接近最优的自动缩放策略。然而,由于自动扩展的可能策略范围很广,这些方法需要大量的尝试,这对于在线应用程序来说是有问题的。例如,Train Ticket3是最大的开源微服务应用程序,由近40个微服务组成。假设每个微服务最多可以有15个副本,确定此应用程序的最佳分配策略无疑是一个np难题,因为最多有1540个可扩展选项。此外,在线优化中反馈回路的持续时间过长,难以实现模型收敛。考虑由在线优化引起的性能下降的潜在风险也很重要。图3显示了突发工作负载对MicroScaler的副本波动和延迟波动的影响[15],MicroScaler是一种在线自动扩展方法,采用在线贝叶斯优化来寻找总成本的全局最小值。由于在线优化导致频繁的在线尝试创建副本(图3a),导致振荡和性能下降(图3b)。因此,我们受到启发,设计了一个离线优化过程,由模拟器的反馈推动。

![截屏2023-07-08 15.25.06](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-08 15.25.06.png)

![截屏2023-07-08 15.26.32](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-08 15.26.32.png)

本文介绍了PBScaler,这是一个横向自动扩展框架,旨在通过识别和解决瓶颈来防止基于微服务的应用程序的性能下降。与之前的工作[11]、[15]所做的针对所有异常微服务优化资源不同,我们提出了TopoRank,一种基于拓扑势理论(TPT)的随机游走算法,用于识别性能瓶颈(PBs)。通过考虑微服务依赖关系和潜在异常,TopoRank提高了瓶颈定位的准确性和可解释性。在通过TopoRank识别PBs后,PBScaler进一步采用遗传算法来寻找接近最优的策略。为了避免过度优化引起的应用振荡,该过程离线进行,并由SLO违例预测器指导,该预测器模拟在线应用并向缩放策略提供反馈。本文的主要贡献总结如下:

•我们提出PBScaler,这是一个瓶颈感知自动伸缩框架,旨在防止基于微服务的应用程序中的性能下降。通过精确定位瓶颈,PBScaler可以减少不必要的扩展并加快优化过程。

•我们采用基于遗传算法的离线优化过程来优化资源消耗,同时避免违反SLO。此过程由SLO违规预测器指导,旨在在不中断在线应用程序的情况下在资源消耗和性能之间取得平衡。

•我们在Kubernetes系统中设计和实现PBScaler。为了评估其有效性,我们在两个在线环境中运行的广泛使用的微服务系统上进行了大量的真实世界和模拟工作负载注入实验。实验结果表明,PBScaler比几种最先进的弹性缩放方法性能更好。本文的其余部分组织如下。在第2节中,我们将讨论微服务的瓶颈分析和自动扩展的相关工作。在第3节中,我们详细描述了整个系统。在第4节中,我们给出了评估和实验结果。第五部分总结了我们的工作,并讨论了未来的研究方向。

2 Related work

随着云计算的发展,学术界和工业界提出了许多云资源(如虚拟机或容器)的自扩展方法[16]、[17]、[18]、[19]。然而,由于微服务之间错综复杂的依赖关系,微服务的自动伸缩可能要复杂得多。

2.1 Bottleneck Analysis

近年来,已经开发了许多用于微服务场景瓶颈分析的方法,其中大多数依赖于三种类型的数据:logsTracemetrics

1)日志。Jia等[20]和Nandi等[21]首先从正常状态的日志中提取模板和流程,与目标日志进行匹配,过滤掉异常日志。

2)痕迹。Trace是一个基于事件跟踪的记录,它再现微服务之间的请求过程。一些研究[22],[23],[24],[25]已经引入使用轨迹来确定瓶颈。Yu等人[22]、[23]结合频谱分析和PageRank算法在trace构建的依赖图上定位瓶颈,Mi等人[24]提出了无监督机器学习原型,学习微服务的模式,过滤异常微服务。然而,使用跟踪可能会干扰代码,并且要求操作人员对微服务的结构有深入的了解。

3)指标。一些方法[2],[26],[27]利用图随机游走算法来模拟异常的传播过程,然后通过集成度量的统计特征和微服务之间的依赖关系来找到瓶颈。此外,CauseInfer[14]和MicroCause[28]等方法侧重于用因果推理构建指标因果图,这通常涉及指标之间隐藏的间接关系。

由于在监控指标时很少修改工作流代码,因此为微服务收集指标通常比使用跟踪更便宜。此外,使用度量作为主要监控数据可以降低集成瓶颈分析和自动伸缩的成本,因为度量在后一种场景中被广泛使用。尽管这些方法具有优势,但大多数方法在选择异常回溯的起始点时没有偏好。相比之下,我们的方法从具有更大异常潜力的微服务开始随机漫步,加快了收敛速度并提高了瓶颈定位精度。

2.2 Autoscaling for microservices

现有的微服务自动伸缩方法可以分为五类。

**1)基于规则的启发式方法 **KHPA、Libra[29]、KHPA- a[30]和PEMA[31]基于资源阈值和特定规则管理微服务副本的数量。然而,由于不同的微服务对特定资源的敏感性不同,因此需要专业知识来支持这些不同微服务的自动伸缩。

2)基于模型的方法 可以对微服务进行建模,以预测它们在特定配置和工作负载下的状态。排队论[32]、[33]和图神经网络(GNN)[12]常用来构建微服务的性能预测模型。

3)基于控制理论的方法 [11],[32] 使用控制理论,SHOWAR[11]动态调整微服务副本,以纠正监控指标和阈值之间的错误。

4)基于优化的方法 这些方法[15]、[34]在给定当前资源和工作负载的情况下,进行了大量的尝试来寻找最优策略。这些方法的关键在于缩小决策范围,加快决策进程。

5)基于强化学习的方法 强化学习(RL)已广泛应用于微服务的资源管理。MIRAS[35]采用基于模型的RL方法进行决策,避免了真实环境的高采样复杂度。FIRM[4]利用支持向量机(SVM)识别微服务中的关键路径,并利用深度确定性策略梯度(DDPG)算法为路径上的微服务制定混合扩展策略。基于强化学习的方法需要在探索过程中不断与环境进行交互,并且无法适应动态微服务架构。

总之,尽管前面提到的自动缩放技术有各自的优势,但它们很少关注性能瓶颈。为非瓶颈微服务消耗计算机资源将不可避免地增加扩展成本和延长决策时间。另一方面,我们的方法侧重于定位性能瓶颈。

3.System Design

我们提出了PBScaler,一个以PB为中心的自动缩放控制器,用于定位PBs并优化它们的副本。如图4所示,PBScaler包括三个组件:

1)度量收集器Metric Collector:为了提供对应用程序状态的实时洞察,我们设计了一个度量收集器,它以固定的间隔从Prometheus4捕获并集成监视度量。

2)性能瓶颈分析Performance Bottleneck Analysis:在度量收集器的帮助下,该组件执行SLO违规检测和冗余检查,以识别异常行为的微服务。接下来,瓶颈定位过程将被触发以精确定位异常微服务中的PBs。

3)缩放决策Scaling Decision:该组件旨在使用进化算法确定PBs的最佳副本数量。最后,PBScaler生成具有优化策略的配置文件,并将其提交给kubernetes-client5, kubernetes-client5调节微服务的副本计数。

3.1 Metric Collector

自动伸缩系统依赖于对指标的实时访问,例如内存使用数据、系统负载和尾部延迟,以确定是否应该执行弹性伸缩以及应该在微服务应用程序中分配多少资源。与需要深入了解程序和代码注入的基于跟踪的监视器不同[7],Metric Collector根据服务网格报告指标,以最大限度地减少对业务流的中断。如表1所示,PBScaler使用Prometheus和kube-state-metrics来收集和分类这些指标,包括响应延迟、微服务之间的调用关系资源消耗和微服务工作负载

![截屏2023-07-08 16.12.07](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-08 16.12.07.png)

![截屏2023-07-08 16.15.24](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-08 16.15.24.png)

例如,容器cpu使用秒总数是Prometheus中的一个标签,它记录了容器级别的中央处理单元(Central Processing Unit, cpu)使用情况。我们将Prometheus的监视间隔设置为5秒,并将收集到的指标数据存储在时间序列数据库中。每个微服务的P90尾部延迟被收集并用于表示应用程序的性能。调用关系暗示了微服务之间的关联,可用于构建微服务关联图。

服务网格Service Mesh

服务网格是一种基础设施,它使开发人员能够在不需要额外代码的情况下向云应用程序添加高级功能,如可观察性和流量管理。一个流行的开源服务网格实现是Istio7,旨在与Kubernetes无缝集成。当一个pod在Kubernetes中启动时,Istio在pod中启动一个特使代理来拦截网络流量,从而实现工作负载平衡和监控。

3.2 Performance Bottleneck Analysis

性能瓶颈分析(PBA)是一个旨在发现微服务应用程序中性能下降和资源浪费的过程,以推断当前问题的PBs。如第1节所述,通过准确定位这些瓶颈,PBA可以提高自动扩展的性能并减少过度的资源消耗。算法1描述了PBScaler中的PBA过程。

![截屏2023-07-09 14.19.35](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.19.35.png)

3.2.1 SLO Violation Detection

为了检测微服务中的异常,PBScaler使用服务水平目标(slo)与特定指标进行比较。如果微服务有大量的SLO违规,即性能下降,则被认为是异常的。如[14]、[27]所述,检测违反SLO是触发瓶颈定位的关键步骤。可以利用Metric Collector收集的调用关系来构建微服务关联图Gc。PBScaler每15秒检查Gc中所有调用边的P90尾部延迟,以及时检测性能下降。如果调用的尾部延迟超过预定的阈值(如SLO),则调用的被调用微服务将被添加到异常微服务集中,并且将激活瓶颈本地化过程。为了考虑微服务延迟中偶尔出现的噪声,阈值设置为slox (1 + α/2),其中α用于调整对噪声的容忍度

3.2.2 Redundancy Checking

在没有性能异常的情况下,一些微服务可能会被分配比所需更多的资源。然而,仅仅通过度量来识别这样的情况是很困难的,这可能会浪费有限的硬件资源。为了避免这种情况,有必要确定哪些微服务分配了多余的资源。PBScaler使用微服务每秒的工作负载变化率来确定资源是否冗余。

此策略比仅依赖资源消耗更有效,因为不同的微服务对异构资源的敏感性可能不同。冗余检查背后的主要思想是采用假设检验来检测微服务当前的工作负载wi c是否显著低于其过去的工作负载(表示为wi p)。显著程度通过参数βt来调整,假设检验可以表示为:

![截屏2023-07-09 14.18.53](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.18.53.png)

为了执行假设检验,我们首先从Metric Collector获取目标微服务的当前和历史工作负载。然后我们使用单侧检验来计算p-value P。如果P不超过置信水平cl(默认设置为0.05),我们拒绝零假设H0,并认为微服务i具有冗余资源。

![截屏2023-07-09 14.21.30](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.21.30.png)

3.2.3 Bottleneck Localisation

由于微服务应用程序中复杂的交互[36],并不是每个异常的微服务都需要扩展。例如,图5说明了瓶颈微服务(例如,Product)的性能下降如何沿着调用链传播到它的上游微服务(例如,recommendation, Frontend和Checkout),即使上游微服务没有过载。因此,只有瓶颈微服务必须被扩展,而其他异常微服务只是被牵连。为了精确定位瓶颈微服务,我们引入了异常潜力的概念,它聚集了给定位置上所有微服务的异常影响。由于PB被许多受其影响的异常微服务所包围,因此PB的异常潜力通常很高。我们设计了一种新的瓶颈定位算法TopoRank,该算法在随机行走中引入拓扑势理论(TPT)来计算所有异常微服务的得分,并最终输出一个排名列表(rl)。在rl中得分最高的微服务可以被识别为PBs。

![截屏2023-07-09 14.25.11](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.25.11.png)

TPT源于物理学中“场”的概念,在各种著作中被广泛应用[37],[38]来衡量复杂网络中节点之间的相互影响。由于微服务关联图也可以被视为一个复杂的网络,我们使用TPT来评估微服务的异常潜力。具体来说,我们已经观察到,在微服务关联图Gc中,离PBs更近的微服务,即那些跳数较少的微服务,更有可能出现异常,因为它们经常频繁地直接或间接调用PBs。基于这一观察,我们使用TPT评估微服务的异常潜力。为此,我们首先通过识别异常微服务及其在Gc中的调用关系来提取异常子图Ga。然后,我们使用TPT计算异常子图Ga中微服务vi的拓扑势。

![截屏2023-07-09 14.26.52](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.26.52.png)

N为vi的上游微服务数,mj为vj的异常程度。PBScaler将异常程度定义为微服务在一个时间窗口内违反SLO的次数。dji表示从vj到vi所需的最小跳数。我们使用影响因子σ来控制微服务的影响范围。

然而,具有高拓扑潜在值的微服务不一定是PBs,因为异常通常沿着微服务相关图传播。因此,单纯依靠TPT诊断PBs是不够的。为了解决这个问题,PBScaler结合了个性化PageRank算法[39]来逆转异常子图Ga上的异常传播并定位PBs。设P为Ga与Pi的转移矩阵,j为异常从vi跟踪到其下游节点vj的概率。给定输出度为d的vi,标准个性化PageRank算法将Pi,j集合为:

![截屏2023-07-09 14.29.14](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.29.14.png)

这意味着该算法不会偏向于任何下游微服务。然而,这个定义没有考虑下游微服务和当前微服务异常之间的关联。因此,PBScaler通过更多地关注与上游响应时间更相关的下游微服务来调整计算。对于每个微服务vi, PBScaler收集尾部延迟序列(li)和一组度量数组Mi = {m1, m2,···,mk},其中mk可以被视为给定时间窗口内度量(例如,内存使用)的波动数组。PBScaler定义Pi,j取决于Mj中li与度量数组之间的Pearson相关系数的最大值:

![截屏2023-07-09 14.31.44](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.31.44.png)

个性化PageRank算法通过在有向图上随机行走来确定每个节点的受欢迎程度。然而,一些节点可能永远不会指向其他节点,导致所有节点的分数在多次迭代后趋于零。为了避免落入这个“陷阱”,应用了一个阻尼因子δ,它允许算法根据预定义的规则从这些节点跳出来。通常δ设为0.15。个性化PageRank表示如下:

![截屏2023-07-09 14.33.44](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.33.44.png)

其中v表示每个微服务节点被诊断为PB的概率。偏好向量u作为个性化规则,引导算法跳出陷阱。u的值由每个节点的异常势决定。异常潜力较大的节点优先作为算法的起始点。第k次迭代的方程可以表示为:

![截屏2023-07-09 14.34.19](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.34.19.png)

经过多轮迭代,v逐渐收敛。PBScaler然后对最终结果进行排序并生成排名列表。排名列表得分最高的前k个微服务可以被识别为PBs。TopoRank的整个过程描述为算法2。

3.3 Scaling Decision

给定性能瓶颈分析确定的PBs,将对PBs的副本进行缩放,以最小化应用程序的资源消耗,同时确保微服务的端到端延迟符合SLO。尽管大量的副本可以缓解性能下降问题,但它们也会消耗大量的资源。因此,必须在性能保证和资源消耗之间保持平衡。缩放决策的过程将被建模为一个约束优化问题,以实现这种平衡。

3.3.1 Constrained Optimation Model

我们场景中的自动缩放优化试图确定一个分配模式,该模式为每个PB分配可变数量的副本。给定n个需要缩放的PBs,我们将策略定义为集合X = {x1, x2,···,xn},其中xi表示分配给pbi的副本数量。在优化之前,PBs的初始副本数量可以表示为C = {c1, c2,···,cn}。应该注意的是,PBScaler中的副本约束应该分别为按比例缩小和按比例扩大的流程定义。在扩展过程中,我们对PBs的副本数量进行了如下限制:

![截屏2023-07-09 14.36.59](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.36.59.png)

其中c^max表示给定有限的服务器资源,微服务可以扩展到的最大副本数量。在缩小过程中,副本数量的约束可以表示为:

![截屏2023-07-09 14.38.06](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.38.06.png)

在Eq.(8,就是上面这个公示)中,γ(默认值为2)表示复制减少的最大数量。这个限制是合理的,因为正如在实验中观察到的那样,大幅减少微服务副本的数量可能会导致短暂的延迟峰值。

缩放决策的目标是尽量减少应用程序的资源消耗,同时保持其性能。应用程序性能通常用用户更关心的SLO违规来表示。因此,应用程序性能奖励可以细化为:

![截屏2023-07-09 14.40.23](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.40.23.png)

在优化过程中,应用程序的资源消耗(如CPU和内存使用)是不可预测的。为了保守地估计资源消耗,我们考虑PB副本与可分配副本的最大数量的比率,而不是计算CPU和内存的成本。我们将资源奖励计算为:

![截屏2023-07-09 14.41.36](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.41.36.png)

我们的目标是在保证性能的同时尽量减少资源消耗。我们利用加权线性组合(WLC)方法来平衡这两个目标。最终优化目标定义为:

![截屏2023-07-09 14.43.20](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.43.20.png)

式中λ∈[0,1]。我们将λ设置为参数,以平衡应用程序性能和资源消耗。

3.3.2 SLO Violation Predictor

为了计算性能奖励R1,需要评估策略是否会导致在线应用程序违反SLO。一种简单的方法是直接在在线申请中执行候选策略,并等待监控系统的反馈。然而,在线应用程序中频繁缩放引起的振荡是不可避免的。另一种方法是使用历史度量数据训练评估模型,该模型可以模拟在线应用的反馈。在不与在线应用程序交互的情况下,该模型根据应用程序的当前状态预测应用程序的性能。

我们使用向量r来表示执行扩展策略x后每个微服务的副本数量。w是表示每个微服务当前工作负载的向量。由于瓶颈感知优化的时间成本较低,我们可以合理地假设w在此期间不会发生显著变化(参见4.2节)。给定由工作负载w和所有微服务副本r表示的应用状态,一个SLO违例预测器可以设计为:

![截屏2023-07-09 14.47.43](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.47.43.png)

其中ψ是一个二元分类模型。选择合适的分类模型的细节将在4.3节中讨论。用于训练的历史度量数据可以使用经典缩放方法(默认是Kubernetes自动缩放器)或随机方法生成。我们在3个节点(共44个CPU核和220gb RAM)上部署了一个开源微服务系统,并进行了弹性扩展。普罗米修斯以固定的时间间隔收集每个微服务的工作负载和P90尾部延迟。通过比较前端微服务的尾部延迟和SLO,可以很容易地标记每个时间间隔的监控数据。

![截屏2023-07-09 14.48.39](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.48.39.png)

Section 4.3。用于训练的历史度量数据可以使用经典缩放方法(默认是Kubernetes自动缩放器)或随机方法生成。我们在3个节点(共44个CPU核和220gb RAM)上部署了一个开源微服务系统,并进行了弹性扩展。普罗米修斯以固定的时间间隔收集每个微服务的工作负载和P90尾部延迟。通过比较前端微服务的尾部延迟和SLO,可以很容易地标记每个时间间隔的监控数据。

3.3.3 Autoscaling Optimisation

如第3.3.1节所述,性能和资源消耗之间的权衡可以建模为约束优化问题。为了找到接近最优的策略,PBScaler使用遗传算法(GA)来生成和优化扩展策略,以减少资源消耗,同时满足SLO要求。遗传算法通过模拟进化中的自然选择,在淘汰劣等子代的同时提高优等子代。首先,遗传算法执行随机搜索来初始化策略空间中的染色体种群,每条染色体表示优化问题的潜在策略。接下来,在每次迭代中,将选择具有高适应度的精英染色体(称为精英)进行交叉或突变以产生下一代。

我们场景中的自动缩放优化旨在确定一种缩放策略,该策略为每个PB分配可变数量的副本。自缩放优化过程如图6所示。一开始,PBScaler获得每个微服务当前的副本数量r和工作负载w。在性能瓶颈分析之后,PBScaler从r中识别PBs,并将它们过滤出来,得到r ‘。然后,决策者生成PBs策略的总体。由于要扩展的微服务数量会影响优化算法的速度和效果(第4.3节),PBScaler假设只有PBs需要进行弹性扩展。换句话说,r ‘中的副本数量将保持不变。SLO违例预测器负责评估生成的策略。需要注意的是,该策略与r ‘合并,并与w一起输入到SLO违规预测器中。通过遗传算法选择优策略Xbest,并与r ‘合并,生成最终决策。

![截屏2023-07-09 14.41.12](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 14.41.12.png)

在优化阶段,决策者使用遗传算法生成并改进PB的缩放策略,如算法3所述。在每个PB的策略范围内随机生成一个种群(第1行)后,Decision Maker根据Eq.(11)估计每个策略的适应度,并存储精英(第2-3行)。在每次迭代中,遗传算法使用基于锦标赛的选择算子来挑选优秀的亲本(第6行)。通过使用两个点交叉算子和双染色体突变算子,通过重组和突变(第7-8行)产生新的后代。通过模拟自然选择,新的后代和亲本精英形成一个新的种群,进入下一个迭代(第9行)。当遗传算法达到指定的迭代次数时,Decision Maker返回适应度最高的缩放决策Xbest。

4 Evaluation

在本节中,我们详细介绍了自动缩放的实验场景,包括PBScaler与学术界和工业界几种最先进的自动缩放算法的比较。

4.1 Experimental Setup

4.1.1 Microserice Platform

实验在我们的私有云集群中进行,该集群由三台物理计算机(一个主节点和两个工作节点)组成,共有44个Intel 2.40 GHz CPU内核和220gb RAM。为了评估自动扩展,我们选择了两个开源微服务应用程序作为基准:a)在线精品8,一个由谷歌开发的基于web的电子商务演示应用程序。该系统通过10个无状态微服务和Redis缓存的协作,实现了浏览产品、将商品添加到购物车和支付处理等基本功能。b) Train Ticket9:复旦大学开发的大型开源微服务系统。Train Ticket拥有40多个微服务,并使用MongoDB和MySQL进行数据存储,可以满足各种工作流程,如在线门票浏览,预订和购买。由于集群资源的限制,我们将每个微服务限制为不超过8个副本。源代码可在Github上获得.

4.1.2 Workload

我们评估了PBScaler在各种流量场景下的有效性,使用了2015年3月16日Wiki-Pageviews[40]的真实维基百科工作负载,以及受Abdullah等人[41]的实验启发的五种模拟工作负载(EW1 ~ EW5)。我们将实际工作负载压缩到一个小时,并将其扩展到适合我们集群的级别。五个模拟工作负载表现出不同的模式,例如单峰、多峰、上升和下降,并且持续时间限制为20分钟。图7描述了这些工作负载的波动情况。

4.1.3 Baseline Metrics

我们将PBScaler与学术界和工业界的几种最先进的微服务自动扩展方法进行比较,这些方法从静态阈值、控制理论和黑盒优化的角度执行微服务的动态水平扩展。

Kubernetes水平Pod自动缩放(KHPA):这是Kubernetes默认的水平缩放方案。通过为特定的资源R定制阈值T (CPU使用率为默认值),并从微服务的所有副本中聚合资源使用URi, KHPA将副本的目标数量定义为

![截屏2023-07-09 15.19.46](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 15.19.46.png)

MicroScaler[15]:这是一个自动扩展工具,它使用黑盒优化算法来确定微服务的最佳副本数量。MicroScaler计算微服务的P90/P50进行分类,然后执行贝叶斯优化的四次迭代来做出扩展决策。

SHOWAR[11]:它是一种混合自缩放技术。我们在SHOWAR中再现了水平缩放部分,它使用PID控制理论使观察到的度量逐渐接近用户指定的阈值。在我们的实现中,我们用更常见的P90延迟替换了运行队列延迟,因为前者需要额外的eBPF工具。

4.1.4 Experimental Parameters and Evaluation Criteria

在我们的实验中,我们将普罗米修斯的收集间隔固定为5秒。随着实验时间和工作负载的增加,MongoDB等有状态微服务所需的数据量也会增加。最终,数据量将超过可用内存,从而需要使用磁盘存储。这种转换可能导致无法通过自动缩放来补救的性能下降。因此,我们将工作负载测试限制为无状态跟踪。在线精品店和火车票的SLO值分别设置为500毫秒和200毫秒。在SLO违例检测和冗余检查模块中,PBScaler首先将动作边界α设置为0.2,以减少噪声干扰。然后,我们根据经验将显著度β设置为0.9,以控制触发扩展的工作负载水平。对于瓶颈定位,将拓扑势的影响因子σ设为1,将rl中得分最高的top-k (k =2)个微服务视为PBs。

我们选择了SLO违规率、资源消耗和响应时间来评估自缩放方法的性能。如果自动缩放方法可以减少响应时间、SLO违规率和资源消耗,则认为它更有效。我们将SLO违例率定义为端到端P90尾部延迟超过SLO的百分比。资源消耗按照[42]给出的方法计算,其中CPU价格为0.00003334$ (vCPU/s),内存价格为0.00001389$ (G/s)。总资源消耗由内存成本和CPU成本相加得到。

![截屏2023-07-09 15.23.18](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 15.23.18.png)

![截屏2023-07-09 15.24.13](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 15.24.13.png)

4.2 Performance Evaluation

表2比较了具有不同工作负载的两个微服务应用程序中四种自动伸缩方法的SLO违反率和资源成本。None方法用作引用,不执行自动缩放操作。其结果以灰色表示,并被排除在比较之外。

![截屏2023-07-09 15.25.47](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 15.25.47.png)

一般来说,PBScaler在减少两个微服务系统中六个工作负载下的SLO违规和最小化资源开销方面优于其他竞争方法。其中,PBScaler在火车票中的SLO违规率比基线方法平均降低4.96%,资源成本平均降低0.24美元。结果表明,PBScaler可以快速、精确地对大规模微服务系统中的瓶颈微服务进行弹性扩展,从而减少了SLO违规,节省了资源。对于Online Boutique中的6个工作负载,PBScaler还在其中4个模拟工作负载中实现了最低的SLO违规,并在3个模拟工作负载中实现了最低的资源消耗。

图8描绘了六种工作负载下不同方法的延迟分布箱形图,探讨了每种方法对微服务系统性能的影响。可以看出,大多数自动缩放方法都可以保持延迟分布的中位数低于红色虚线(SLO)。但是,只有PBScaler进一步将第三个四分位数降低到所有工作负载的SLO以下。

为了评估使用PBScaler进行弹性缩放的时间成本,收集并计算了PBScaler中每个模块所需的平均时间。如表3所示,Online Boutique中所有PBScaler模块的总时间成本小于一个监控间隔(即5s),而Train Ticket的相同度量小于两个监控间隔。由于PBA缩小了决策范围,当应用程序从Online Boutique切换到Train Ticket时,尽管微服务的数量增加了,但决策者的时间成本并没有增加太多(不超过6.6%)。然而,我们认识到随着微服务规模的增长,PBA的时间消耗会迅速增加,这将是我们未来的工作。

![截屏2023-07-09 15.29.10](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 15.29.10.png)

![截屏2023-07-09 15.29.50](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 15.29.50.png)

4.3 Effectiveness Analysis of Components

4.3.1 Performance Comparision of Bottleneck Localization

为了评估TopoRank算法是否能有效定位突发工作负载引起的PBs,我们通过Chaos Mesh将CPU过载、内存溢出和网络拥塞等异常注入到Online Boutique和Train Ticket中。这些异常通常是由高工作负载条件引起的。使用TopoRank算法分析度量并确定这些异常的性能瓶颈。然后将定位结果与micrororca[27]进行比较,micrororca是微服务根本原因分析的基线方法。AC@k测量前k个结果中实际PBs的精度,Avg@k是前k个结果的平均精度。这些指标的计算方法如下。A表示微服务的集合,RT@k表示排名列表中排名前k的微服务。

![截屏2023-07-09 15.32.15](../images/:Users:joshua:Library:Application Support:typora-user-images:截屏2023-07-09 15.32.15.png)

图9给出了TopoRank和MicroRCA在不同微服务应用中的AC@1和Avg@5值。结果表明,TopoRank在这两个指标上都优于MicroRCA。这主要是由于TopoRank在执行个性化PageRank时考虑了异常可能性和微服务依赖关系。

瓶颈定位的主要目的是缩小策略空间,加快最优策略的发现。我们在PBs和所有微服务上执行遗传算法迭代,以证明瓶颈定位对优化的影响。图10描述了微服务系统下的迭代过程,并表明随着人口(Pop)的增加,pb感知策略在适应度方面明显优于适用于所有微服务的方法。该策略可以在不到5次迭代中获得较好的适应度。相比之下,涉及所有微服务的方法需要更大的种群和更多的迭代才能达到相同的适应度水平。这是由于pb感知策略帮助遗传算法精确地缩小了优化范围,加速了优解的获取。

4.3.2 Effectiveness of the SLO Violation Predictor

SLO违例预测器的目标是直接预测优化策略的结果,而不是等待在线应用程序的反馈。我们根据每个微服务的副本数量和工作负载来确定是否会出现性能问题。为预测任务选择合适的二分类模型至关重要。以5秒的数据收集间隔,我们在我们的集群中收集了两个数据集,包括3.1k的火车票历史采样数据集(a)和1.5k的在线精品数据集(B)。为了对这两个数据集进行训练和测试,我们采用了四种经典的机器学习方法,包括支持向量机(SVM)、随机森林(Random Forest)、多层感知器(Multilayer Perceptron)和决策树(Decision Tree)。表4给出了四种模型对SLO违规预测的准确率和召回率。根据两个数据集的效果,我们最终选择随机森林作为SLO违例预测的主要算法。

为了证明SLO违例预测器可以替代来自真实环境的反馈,我们将使用SLO违例预测器的PBScaler与从在线系统收集反馈的MicroScaler进行了比较。我们将突发工作负载注入Online Boutique,并仅使一个微服务异常,以消除两种方法在瓶颈定位方面的差异。如图11所示,在预测器的引导下,PBScaler的决策尝试次数和频率远低于MicroScaler。减少集群中的在线尝试将明显降低振荡的风险。

5 Conclusion

本文介绍了PBScaler,一个瓶颈感知自动伸缩框架,旨在防止基于微服务的应用程序的性能退化。PBScaler使用服务网格技术收集应用程序的实时性能指标,并动态构建微服务之间的关联图。为了处理由外部动态工作负载和微服务之间复杂调用引起的微服务异常,PBScaler采用基于拓扑势理论的随机游动算法TopoRank来识别瓶颈微服务。此外,PBScaler采用离线进化算法,在SLO违规预测器的指导下优化缩放策略。实验结果表明,PBScaler可以在最小化资源消耗的同时实现较低的SLO违规。在未来,我们计划从以下两个方面改进我们的工作。首先,我们将探索在细粒度资源(例如,CPU和内存)管理中使用瓶颈感知的潜力。其次,我们将探讨如何规避自扩展中有状态微服务的干扰,因为有状态微服务的性能下降可能会破坏自扩展控制器。第三,提高大规模微服务系统性能瓶颈分析的效率。

LOTUS

概要:边缘计算系统通常面临资源分配效率低下的问题,导致不必要的等待时间和服务质量(QoS)的降低。本文提出了一种延迟优化混合自动缩放器(LOTUS)的体系结构来解决这个问题。这是通过实施深度学习模型来预测未来的工作负载,从而优化边缘的资源使用来实现的。LOTUS主动预测资源需求,并相应地对其进行扩展,以减少总体延迟,并从最终用户的角度改进QoS。为了证明准确的工作负载预测的重要性,我们将提出的系统架构设计为部署在具有有限可用资源的分布式Kubernetes集群中。通过动态预测传入工作负载的资源需求,LOTUS可以有效地分配资源,从而减少最终用户的往返时间。索引术语:边缘计算,自动缩放,深度学习,延迟优化

介绍

边缘计算是一种分布式计算,它使计算资源更接近网络的边缘,数据在那里生成和使用。边缘计算的目标是减少处理数据的延迟,从而提高最终用户的服务质量(QoS)。传统上,计算资源集中在大型数据中心,当需要长距离发送数据时,这可能导致高延迟。有了边缘计算,资源分布在一个大的地理区域,允许数据在更靠近它们产生的地方进行处理。

然而,边缘计算的分布式特性对有效的资源分配提出了挑战。与传统数据中心相比,边缘设备通常具有有限的资源,因此优化其使用以满足QoS要求至关重要。实现这一目标的一种方法是使用自动缩放技术,这是一个动态过程,使系统能够根据当前和预期的需求自适应提供资源。通过这样做,这些技术可以帮助确保有效的资源利用和最佳的QoS级别。由于Kubernetes的流行,Kubernetes Horizontal Pod Autoscaler (HPA)是Kubernetes集群中广泛使用的管理工作负载扩展的工具[1]。它使用一个预定义的比率来根据环境的变化计算所需的pod数量,如:

image-20230719200127732

其中m_c是要优化的度量的当前值,m_d是期望值。度量可以是CPU利用率、内存利用率或自定义度量。

然而,这种方法并没有针对边缘计算的动态和资源受限环境进行优化。因此,它可能无法快速有效地实时响应变化。因为它只在发生更改之后才扩展资源,所以这种方法可能会增加总体往返时间(RTT)并影响性能,特别是在对时间敏感的应用程序中。

主动自缩放技术最近成为克服边缘被动自缩放器所面临的延迟问题的有希望的解决方案[2]。它们的目标是在工作负载生成之前对其进行预测,以便系统可以为预测的工作负载分配足够的资源。这种主动行为可以改善系统响应时间、资源利用率和应用程序性能。然而,尽管各种时间序列预测技术和机器学习算法已被用于在基于云的基础设施中实现这一目标,但缺乏对边缘计算系统的主动自动缩放技术的研究。现有文献(如[3],[4])大多侧重于水平扩展,而不是采用混合方法,这阻碍了它们的潜在效益。

本文提出了一种创新的边缘主动混合自动缩放解决方案,称为延迟优化混合自动缩放(LOTUS)。我们的方法通过利用深度学习(DL)进行工作负载预测和资源分配,克服了现有自动缩放技术的局限性。它使用了一种混合缩放方法,结合了基于预测模型和算法权重的水平和垂直缩放技术。LOTUS的主要目标是通过主动预测工作负载并相应地分配资源来最小化延迟和缓解冷启动问题。通过这样做,LOTUS可以确保资源得到最佳配置,以处理预期的工作负载,并提高系统的性能。

LOTUS的体系结构采用了MAPE-K[10]循环,并且专门设计用于部署在分布式Kubernetes集群上,使其具有可伸缩性和灵活性,可以处理各种边缘计算资源配置。使用深度学习进行工作负载预测确保了我们方法的准确性和可靠性,使LOTUS能够准确预测工作负载变化并主动分配资源,从而在边缘实现高效和有效的自动扩展。具体而言,本文提出了以下重大贡献,这些贡献推动了边缘计算环境中自动缩放领域的发展:

image-20230719230714698

  1. 一种新型的主动混合自动缩放器,利用深度学习技术优化资源分配并减少延迟。

  2. Kubernetes平台上自动缩放器的详细实现

  3. 包括工作量建模和资源预测的定量评估。

本文的其余部分组织如下。在第二节中,我们回顾了相关文献,并强调了它们的局限性。第三部分概述了系统设计,概述了其关键组件和功能。接下来,在第四节中,我们描述了方法,并提供了有关系统设计和所采用算法的详细信息。在第五节中,我们介绍了实验评估的设置。接下来是第六节,在那里我们提出了结果。然后,我们在第七节中对我们的发现进行了深入分析。最后,在第八部分,我们对本文进行了总结,并对未来的工作进行了讨论。

二 相关工作

本节探讨了最近在边缘计算环境中提出的用于自动缩放的不同技术。表一也对它们进行了总结和比较。

自动扩展策略可以分为被动策略和主动策略。反应式(被动)自动缩放器只是对系统和环境的变化做出“反应”,并相应地调整资源。反应式自动缩放器最常见的用途是水平Pod自动缩放器(HPA),这是因为它是Kubernetes提供的标准自动缩放器[1]。其他例子还包括[5]、[6]、[11]。这种响应式自动缩放技术对用户定义指标的变化做出反应,导致其明显的缺点。这个缺点是变更和行动之间的时间延迟导致系统整体效率降低。这是由于pod需要时间来初始化、重新分配或终止。由于网络流量的变化是对其作出反应,而不是准确预测,因此这种时间延迟是无法消除的[4]。

image-20230719231103718

解决响应式自动缩放器所面临的时间延迟的一种常用技术是在流量在网络上生成之前预测它将是什么。这些技术的共同点是使用时间序列预测技术来预测工作负载,使用机器学习方法来估计所需的资源[8],[12]。使用这些技术,网络可以在工作负载生成之前就拥有最佳的节点配置。如果实现正确,这些技术的好处是显而易见的,包括与响应式自动缩放相比,改进的资源利用率、更好的应用程序性能和成本优化。[8]提出的算法采用时间序列分析来预测容器的CPU使用情况,并主动调整容器的扩容。[13]中的工作提出了一种混合扩展策略,以在预测未来资源需求时平衡性能和成本。其他项目通过实施深度强化学习来推进自动扩展方法,例如[14],旨在通过最小化响应时间和数据包丢失来最大化服务质量。

然而,上述方法的主要局限性在于它们是设计用于云计算平台,而不是边缘计算系统。尽管边缘计算基础设施的采用越来越多,但缺乏对边缘主动自缩放技术的研究,这可以显着提高此类系统的效率和可靠性。我们对文献的回顾表明,在边缘工作的自动缩放技术的发展仅由[4],[9],[15]解决。[15]提出了一种基于深度学习模型的自动扩展架构,用于水平扩展。[9]的工作使用延迟估计来垂直扩展应用程序容器。这两篇文章都只关注响应式扩展。[4]的工作使用主动Pod自动缩放器实现了时间序列预测,以提高资源利用率。结果显示效率得到了提高,但也存在一些缺点,比如预测不准确,需要考虑意外的工作负载。这些工作的另一个限制是它们只关注水平扩展,因此进一步这项工作的自然进展将是混合自动扩展的结合,以最大限度地利用可用资源。

表1是LOTUS与现有相关作品的比较。之前的大多数研究都强调横向自动扩展,并且仅限于基于虚拟机的云基础设施,同时使用一组受限的性能指标进行决策。相比之下,LOTUS为容器化应用程序提供了一种新的混合预测自动缩放模型,该模型依赖于一组固定的指标,并在边缘操作。它提供了一个独特的解决方案,结合了水平和垂直自动缩放技术的优点。

三 系统设计

本节深入研究LOTUS所采用的体系结构和算法的细节。

A. LOTUS Architecture

图1概述了组成LOTUS体系结构的组件及其相互连接。它还突出了系统内的数据流。LOTUS体系结构可以分为两个不同的部分。第一个是包含Kubernetes集群的边缘网络。我们故意混淆了这一部分,以解释实际环境中硬件配置的异构性质。第二部分是我们的自动缩放系统,它执行以下核心功能:

请求处理。请求处理程序接收用于集群的传入请求,并随后传输它们。它包含度量评估器的细分。这个功能很简单:在发送请求之前,启动计时器,一旦接收到响应,计时器停止,并计算RTT。HTTP响应代码也被记录下来,使我们能够深入了解系统状态。该信息使我们能够通过跟踪故障错误代码来检测系统故障或超时的实例。此外,这使我们能够记录更少的常见错误代码,从而扩展了系统的整体可用性,这些错误代码可能表明系统中的特定问题。

工作量建模。这是自动扩展体系结构的一个关键功能,使系统能够根据预期的工作负载需求做出如何扩展基础设施的明智决策,以及何时做出此扩展决策。该组件的输出将是对未来工作负载需求的预测,然后将用于为底层基础设施的扩展决策提供信息。LOTUS使用基于gru(门控循环单元)的时间序列模型来预测工作负载需求[16]。GRU模型用于捕获输入数据中的时间依赖性,这允许模型对未来的工作负载需求做出准确的预测。

控制。此功能负责根据预测的工作负载做出扩展决策。这对LOTUS体系结构的成功至关重要,因为它决定是根据预测的工作负载水平扩展还是垂直扩展。根据接收到的工作负载预测,控制器预测所需的资源(以pod的数量为单位),并确定是水平扩展还是垂直扩展。如果预测的工作负载表明应用程序将经历需求激增,控制器可能决定通过向集群添加更多pod来水平扩展它。这种方法确保应用程序可以通过将流量分配到工作pod来处理增加的流量负载。如果预测的pod数量与当前数量相同,则控制器假定预测的工作负载将需要更多的资源,例如CPU或内存。然后控制器决定通过向每个pod分配更多资源来垂直扩展。如果预测的工作负载表明需求下降,控制器还可以决定缩小集群的规模。值得一提的是,控制器的决策过程完全依赖于预测模型的准确性,因为它永远无法访问工作负载的实际请求计数。值得注意的是,控制器做出准确缩放决策的能力在很大程度上依赖于预测模型的准确性。由于控制器无法访问工作负载的实际请求计数数据,因此它完全依赖于预测模型生成的预测工作负载。

知识存储。知识库组件的核心是一个数据存储,用于收集和存储来自系统内各种来源的相关数据。这些数据可能包括正在处理的请求数量、每个请求的处理时间、活动实例数量以及可用的系统资源(如CPU、内存和磁盘空间)等信息。知识库组件通过从体系结构中的其他组件收集信息来获取信息,这些组件依次访问知识库以实现各自的功能。

B. LOTUS Algorithms

工作负载建模。GRU模型被用于模拟我们在本研究中使用的两个数据集的工作量(这些将在下一节中介绍)。这些模型允许我们预测指定时间戳的HTTP请求计数。然后将此预测传递给我们的资源预测系统。

资源预测. 最初,我们假设水平缩放,并利用线性回归pod数模型来确定预测的pod数。然后将其与当前的pod计数进行比较,如果不同,我们通过添加或删除pod来水平缩放以达到我们的新预测值。否则,将启动垂直缩放。这是因为我们的预测表明集群将需要与当前使用的pod数量相同的pod。我们首先比较之前发送到集群的请求数量以及我们对新请求的新模型预测。计算两个值之间的百分比变化。如果这个变化落在我们预定的可接受范围内,我们根据这个量调整CPU利用率。

四 方法论

本节概述了本研究中遵循的方法,并提供了有关LOTUS功能的详细信息。

A.数据收集和预处理

为了测试自动缩放系统的有效性,我们需要大量的数据作为实时使用数据。为了获得这一点,我们获得了公开可用的数据集,这些数据集以前曾在其他作品中用于此目的。具体来说,我们选取了NASA[17]和1998年世界杯网站[18]的HTTP访问请求数据。这提供了两个不同的web访问数据集,允许我们模拟系统的实际使用情况,并测试系统在实际数据上的有效性。

数据集1 - NASA-HTTP。NASA-HTTP数据集是1995年7月运行的NASA肯尼迪航天中心网络服务器的HTTP访问日志的集合。这些日志 记录了用户在该时间段内访问web服务器上各种资源的请求。数据集包含匿名日志,其中包括诸如每个请求的时间戳、请求的文件/资源、返回的HTTP状态码、响应的大小以及其他相关详细信息。

数据集2 - 1998年世界杯(WC98)。我们介绍的第一个数据集是1998年世界杯数据集,它为1998年世界杯网站提供了大约四个月的HTTP请求(≈13亿)。数据集包含各种相关指标,包括调用计数,这些指标将用于训练我们的深度学习模型。数据集允许我们在选定的时间段或整个时间段内检查系统的行为。WC98数据集是公开可用的,尽管它時間久遠,已被广泛用于研究评估web系统的性能,由于其完整性和代表性。

为了处理数据,我们从两个数据集中消除了任何不完整的数据源,并对剩余数据执行最小-最大规范化。这有助于缩小我们正在处理的值的范围,这是一种成熟的深度学习模型技术。在仔细检查数据集的基础上,我们选择了60/40分割测试和训练数据

数据集。我们注意到数据中存在显著差异,使用小的、非随机的分段进行测试将导致样本不具代表性。通过选择60/40分割,我们的目标是创建代表数据总体分布的训练和测试集。这种方法可以帮助防止模型过度拟合多数类,这将对少数类的性能产生负面影响。

B. Deep learning model for workload forecasting

我们创建了一个PyTorch GRU模型来预测工作负载数据集。模型的输入大小为1,隐藏大小为128,输出大小为1。使用Adam优化器以0.001的学习率和均方误差(MSE)损失函数对模型进行200次训练。表2列出了模型配置。一旦对模型进行了训练,我们只需将时间戳格式的字符串传递给模型。该模型对工作负载进行预测。对于世界杯数据集,这将是预期的HTTP请求计数。与WC98数据集记录每分钟的总请求相比,NASA数据集以每个请求为基础,以最近的秒为单位存储数据。为了将GRU模型用于NASA数据集,我们将其转换为NASA数据集的每分钟请求率。这样做允许我们使用为两个数据集创建的相同的模型体系结构,因为它们现在具有相同的内容格式。选择128的隐藏大小是一个相对较大的值,可能有助于模型捕获输入和输出之间更复杂的关系。这个时间序列模型的主要问题是它依赖于有序时间。这防止了训练过程中数据的洗牌;然而,GRU模型捕获顺序数据中的时间依赖性的能力抵消了这一限制,允许模型根据先前的有序观察准确预测未来的HTTP请求计数。我们在清单1中为GRU模型的训练部分提供了伪代码,该模型采用表II中提供的值,例如criterion、optimizer,并执行一个基本的200次迭代循环来训练模型。

image-20230719234729925

image-20230719234800039

image-20230719234817242

C. Resource forecasting

我们在工作量预测的基础上使用Pod预测模型。这个PyTorch线性回归模型的目的是预测水平缩放的pod计数。模型的配置见表三。它有一个输入特性,即请求计数,和一个输出特性,即pod计数。由于我们从WC98和NASA数据集中获取相同的输入,我们只需要这个单一模型进行资源预测。我们只根据时间戳获取预测的请求计数,并使用它将线性关系映射到HPA用于水平扩展的水平pod计数。在训练过程中,使用均方误差(MSE)损失函数和随机梯度下降(SGD)优化器对模型进行100次训练,学习率为0.01。通过在可用数据上训练模型,它有望学习请求计数和pod计数之间的线性关系,然后它可以使用它对新的输入做出准确的预测。我们在清单2中提供了用于训练简单线性回归模型的伪代码。此代码分配表III中列出的值,并循环100次迭代,训练模型。

image-20230719234906187

我们提出的自动缩放器中的缩放决策机制是一种结合了水平缩放和垂直缩放的混合方法。垂直扩展机制不需要深度学习,因为它被用作请求计数变化过低而无法启动或关闭新pod的情况的备份。最初,我们通过预测pod数来假设水平pod缩放。如果预测计数与当前活动计数相同,我们将改为垂直缩放。这是通过计算以前和新的预测计数之间的百分比变化来完成的。我们在listing3中提供了缩放机制的逻辑流。如果预测的数量较高,我们假设需要更多的资源,如果较低,我们假设需要更少的资源。我们的预测模型考虑了各种因素,例如网络流量和资源利用率的变化,以预测未来的工作负载变化。这使我们的自动缩放器能够主动调整资源分配,防止启动或停止运行容器造成的延迟。我们还设置了90%的CPU使用率和至少50%的硬性限制,以防止资源过度使用或使用不足。总的来说,我们的混合自动缩放方法具有预测功能和备份垂直缩放机制,通过根据需要合并垂直缩放并根据预测的工作负载变化主动调整资源分配,改进了以前水平自动缩放器的局限性。这导致更高效和有效的资源扩展,以处理边缘计算环境中的动态工作负载变化。

image-20230719234956651

image-20230719235123844

五 实验和评估

在本节中,我们将描述评估LOTUS性能的设置。具体来说,我们评估了工作负载建模的准确性和系统在不同条件下的性能。

A. Experimental setup

图1中描述的边缘计算系统通过直接以太网连接和Microk8s服务(一种轻量级Kubernetes实现)展示了一个完全连接的拓扑。需要注意的是,这种体系结构安排是灵活的,不需要静态配置,因为它可以容纳一个节点集群。边缘网络是使用树莓派设备构建的,选择树莓派设备是因为它们的硬件限制。LOTUS体系结构是使用Python实现的,并部署在通过直接以太网链路连接到网络的笔记本电脑上。用于实验的硬件和软件组件如表4所示。

B. Measurable objectives

正如前面介绍中所述,我们对自动缩放器的主要目标是通过最小化请求延迟来提高最终用户的服务质量。这是我们的关键评估指标。我们通过记录我们执行的每个测试中每个请求的RTT并计算平均值(排除异常值)来确定这个指标。我们只考虑有效请求的RTT,因为从最终用户的角度来看,只有完成有效请求所需的持续时间才是重要的。此外,我们根据均方误差损失来评估深度学习的准确性,这是评估模型准确性的公认指标[4]。我们在训练和测试阶段都使用它。与HPA相比,RTT的降低以及模型精度的几乎为零的损失将证明成功的结果。

C. Experiments

我们开发了两个不同的实验来评估系统在不同条件下的性能。这些测试是指数增加和指数减少的情况,在下面解释。

image-20230719235641859

image-20230719235835235

1)指数增长压力测试: 我们需要确定LOTUS将如何适应重大的突然负载变化。为此,我们故意以一种戏剧性和突然的方式增加工作负载,以评估系统处理突发需求高峰的能力。指数增长将从最小负载开始,然后以指数方式增加对集群的请求数量。我们将这样做,直到系统崩溃,通过延迟超过阈值检测到,因此没有返回任何结果。我们将记录成功请求的数量,以查看崩溃点。指数递增算法的伪代码如清单4所示。Kubernetes API的“/healthz”端点用于监控集群的状态。如果响应状态码表明群集不健康,则假定群集已崩溃,并返回请求号。

2)指数下降压力测试: 该实验评估系统在面对工作量急剧下降时的弹性。指数下降的情况与指数上升的情况相反。我们将从一个恒定的高负荷开始,这个负荷要高到足以引起剧烈震动,但又不能高到足以导致坠机。然后,我们将开始以指数方式减少负载,并记录延迟时间,以查看在负载减少的情况下是否分配了太多的资源。指数递减算法的伪代码如清单5所示。

我们选择这些特定的实验来衡量系统对不可预见的边缘情况的反应能力。压力测试特别重要,因为它们可以帮助我们评估系统在不可预测的情况下有效适应和扩展的能力。例如,在1998年世界杯期间,比赛日的网络流量激增,但在比赛结束后迅速消退。尽管这种模式可能在某种程度上是可预测的,但确保我们的系统能够有效地处理和适应它们仍然是必不可少的。

六 实验结果

本节包含来自NASA和WC98数据集的工作负载预测的DL模型的实验结果,以及我们的自动缩放器测试的性能结果。所执行的测试是指数增长和减少用例测试,它们的伪代码分别显示在清单4和5中。在描述模型精度的图形中,x轴表示时间戳(以分钟为单位),y轴表示发送到边缘网络的相应规范化请求计数。

A.工作量建模的准确性

图2和图3分别展示了NASA和WC98数据集的实际结果和预测结果。图2b提供了完整数据集的概述,而图2c则绘制了我们的预测模型。这些数字证明了我们模型的准确性,因为它与实际数据非常吻合。这也适用于WC98数据集,如图3a和3c所示。在整个训练阶段中,loss值在训练周期中呈现出明显的下降趋势,最终达到0.0021 (NASA)和0.0016 (WC98)。整个训练阶段损失值的稳定下降表明我们的模型随着时间的推移正在学习和改进,从而导致更准确的预测。这可以在图4中看到,我们可以看到向Epoch 100的快速收敛,其余的在Epoch 200之前提供较小的精度改进。

B. 压力测试

1)指数增长测试:图5显示了LOTUS和HPA在指数增长实验场景下平均10次的性能对比实验结果。该图显示了两个系统在一段时间内的性能,并显示LOTUS在整个实验过程中始终优于HPA。我们在此图中包含了3个注意点,其中我们已经确定了消除冷启动问题的好处,其中HPA发生超时,而LOTUS不受影响,以及系统之间发生性能反转的未知异常(在32个请求点)。

2)指数下降测试:图6显示了LOTUS和HPA在指数下降实验场景下平均10次的性能对比实验结果。该图显示了两个系统随时间的性能。我们把这张图细分为两部分。第一个标记部分强调了LOTUS在突然和强烈负载条件下有效减少往返时间。本节将演示LOTUS如何以最小的性能损失有效地处理要求苛刻的场景。第二个确定的部分对于结论性的性能评估似乎是不确定的,因为所得到的曲线相互交织,并且没有任何明显的模式来区分两个系统的性能。

image-20230720000109410

image-20230720000134851

image-20230720000202222

image-20230720000216936

七 讨论

本节讨论已确定的趋势和其他值得注意的观察结果。

A.模型精度分析

NASA和WC98的完整数据集与预测一起提供给读者,以提供额外的背景。我们的分析显示,正如两个模型的结果所表明的那样,两个数据集在特定时间间隔内预测请求计数的准确度很高。然而,经过仔细检查,我们观察到WC98数据集并不能准确预测更高水平的计数。这种限制可以归因于这样一个事实,即60%的分割包含相当多的接近零的值,从而导致较不准确的预测。特别是,这在比赛日更加明显,因为接近零的数值更为普遍。尽管存在这些限制,但预测的总体准确性是显而易见的。然而,NASA的数据集显示出略有不同的趋势,几乎所有的高结果都被准确预测,而低结果往往被低估。尽管存在局限性,但预测的总体准确性是明确的,接近于零的损失值进一步加强了这一点。此外,基于我们对完整数据集的分析,我们认为NASA数据集更适合指数用例测试。这是因为NASA数据集训练部分包含一个非常大的值,如图X中NASA数据集图所示,这在WC98数据集中是不存在的。因此,该模型可以在NASA数据集中更准确地预测更高的计数值。总的来说,我们得出结论,GRU时间序列模型非常适合预测请求计数,这是我们研究的关键发现之一。高精确度和可忽略不计的损失值进一步证实了我们的结论,并强调了GRU时间序列模型在这些数据集中预测请求计数的适用性。

B.计算权衡和自动缩放性能

我们使用的时间序列以分钟为单位记录了HTTP请求计数。这意味着在我们的自动缩放器的实际实现中,用于缩放的窗口大小是每分钟一次。然而,我们必须注意到,对于我们的实验测试,我们有充分的理由不能使用这个窗口大小。我们以WC98数据集为例。测试集包含大约51,000个数据条目。要准确地运行它,大约需要连续运行35天。然后必须重复多次,才能在去除异常值后获得可靠的结果。作为替代方案,我们使用数据集逐个索引执行此操作。然而,这个窗口的大小引起了一些担忧。如果我们增加预测的频率,我们可以基于动态请求变化更准确地预测资源需求。然而,这是一个权衡,这是一个自动缩放不稳定。我们越频繁地分配资源,自动缩放器和集群就会变得越不稳定。这可以由其他自动缩放器证明,除了我们的自动缩放器。例如,HPA的缺省更新时间为15秒。这意味着每隔15秒,根据HPA度量公式对度量进行评估和必要的更新。如果我们将其更新为1秒,系统将变得不稳定,因为决策是在更小且可能更不稳定的数据上做出的。此外,我们的自动缩放器有一个权衡,HPA和其他反应式自动缩放器没有。我们需要时间来训练我们的深度学习模型。我们训练了一个基于服务器访问请求的GRU时间序列模型,从中我们将请求发送到集群并记录pod计数。这些数据用于线性回归预测pods数。当我们为我们的数据集和集群硬件训练这些模型时,任何新的数据集或集群硬件重构都需要模型再训练。这是我们的预测式自动缩放器和其他反应式自动缩放器之间最大的折衷。然而,应该注意的是,对于我们的数据集,模型训练和硬件重新配置所需的时间不足减轻了这种权衡,因为数据集和硬件已经建立,这使得我们的系统的性能优势可以看到,正如我们的实验结果所示。

C.消除“冷启动”问题

基于图5所示的发现,我们的指数增长用例结果表明,与HPA相比,冷启动问题得到了明确的消除。这个功能是由LOTUS的主动资源分配提供的,它允许我们提前预测哪些工作负载进入系统,从而有效地消除了响应式自动伸缩方法中固有的时间间隔。LOTUS的这个优势非常有价值,特别是对于时间敏感的应用程序或类似的场景,因为它允许快速的启动时间,可以充分利用。

D.高工作负载环境中的系统性能

就高工作负载能力而言,与HPA相比,LOTUS表现出了明显的性能优势,如图5所示。在这个实验场景中,当HPA在1024个并发请求之后超时时,LOTUS显示RTT没有变化,这表明超时与自动伸缩机制有关,而不是物理硬件限制。此外,图6显示,在高工作负载环境中,LOTUS在达到低工作负载环境之前几乎呈线性下降。相比之下,HPA最初遵循类似的下降,但在256个同时请求时经历了明显的下降。我们假设这种下降是由于与HPA的反应性水平缩放相比,LOTUS的主动混合缩放实施了主动混合缩放,这解释了两个系统之间的差异。

八 总结

本研究提出了一种基于深度学习的自动缩放器架构,以实现边缘计算集群的自主缩放管理。提出的体系结构通过多个用例场景进行测试,为其有效性提供可量化的结果,并展示其实现所需的算法。进行了压力测试以检查架构在极端负载下的性能,结果表明“冷启动”问题显着减少。未来的工作可能需要将工作负载卸载到云端,扩展架构可以部署的集群范围,并实现完整的深度学习垂直扩展,而不是本研究中使用的深度学习水平和数学垂直扩展。

SHOWAR

PBS

摘要–在具有动态工作负载的云应用程序中,自动扩展对于确保最佳性能和资源利用率至关重要。然而,由于工作负载模式的多样性和微服务之间复杂的交互,传统的自动扩展技术通常不再适用于基于微服务的应用。具体来说,性能异常会通过交互传播,导致大量微服务异常,从而难以确定根本的性能瓶颈(PB)并制定适当的扩展策略。此外,为了平衡资源消耗和性能,现有的基于在线优化算法的主流方法需要多次迭代,从而导致振荡,提高了性能下降的可能性。为了解决这些问题,我们提出了 PBScaler,这是一个瓶颈感知自动伸缩框架,旨在防止基于微服务的应用出现性能下降。PBScaler 的关键在于定位 PB。因此,我们提出了 TopoRank,一种基于拓扑潜力的新型随机行走算法,以减少不必要的扩展。通过将 TopoRank 与离线性能感知优化算法相结合,PBScaler 在不中断在线应用的情况下优化了副本管理。综合实验证明,PBScaler 在缓解性能问题的同时有效节约资源,其性能优于现有的最先进方法。

1 介绍

随着微服务架构的发展,越来越多的云应用正在从单体架构向微服务架构迁移[1], [2], [3], [4], [5], [6]。这种新架构将单体应用分解成多个微服务,这些微服务通过 HTTP 或 RPC 协议相互通信,从而降低了应用耦合度[7]。此外,每个微服务都可以由不同的团队独立开发、部署和扩展,从而实现快速的应用程序开发和迭代。然而,外部工作负载的不可预测性和微服务之间交互的复杂性会导致性能下降 [8]、[9]、[10]。云提供商必须准备过多的资源来满足应用程序所有者的服务级别目标(SLO),这通常会造成不必要的资源浪费[11],[12]。因此,满足 SLO 与尽量减少资源消耗之间的不平衡成为微服务资源管理面临的主要挑战。

微服务自动扩展 微服务自动扩展指的是根据工作负载变化弹性分配资源的能力[13]。通过利用微服务的弹性特性,自动伸缩可以缓解资源成本与性能之间的冲突。然而,微服务的自动扩展在短时间内难以准确扩展性能瓶颈(PB)。由于微服务间通信的复杂性,一个 PB 的性能下降可能会通过消息传递传播到其他微服务[2],导致大量微服务同时出现异常。我们通过向谷歌开发的开源微服务应用程序在线精品 1 中的特定微服务注入突发工作负载来证明这一点。图 1 显示,PB Recommend 中的性能下降会蔓延到上游微服务,如 Checkout 和 Frontend。为了进一步验证精确扩展 PB 的重要性,我们进行了压力测试,并分别扩展了不同的微服务。如图 2 所示,微服务(Frontend)的异常扩展无法缓解 SLO 违规情况。但是,当我们识别并扩展 PB 建议时,微服务应用的性能得到了改善。遗憾的是,定位 PB 通常很耗时,而且偶尔会出错[14]。

近年来,人们提出了几种在自动扩展前识别关键微服务的方法。例如,Kubernetes 2 的默认自动分级器会根据计算资源的静态阈值过滤微服务以进行直接分级。Yu 等人[15]通过计算服务功率(即第 50 百分位响应时间(P50)与第 90 百分位响应时间(P90)之比)定义了弹性扩展的边界。此外,Qiu 等人[4] 引入了一种基于 SVM 的方法,通过分析各种尾部延迟的比率来提取关键路径。虽然这些研究缩小了自动缩放的范围,但它们仍然考虑到了可能影响缩放策略的非瓶颈微服务,尤其是当应用程序中的大量微服务同时出现异常时。因此,迫切需要在自动扩展前准确定位瓶颈微服务。

WeChatf5eed04d852c27554c8fbd8d22d8efc1

image-20230806213009944

为了平衡资源消耗和性能,现有研究采用了在线优化算法来寻找接近最优的自动缩放策略。然而,由于自动伸缩的可能策略种类繁多,这些方法需要大量的尝试,这对在线应用来说是个问题。例如,火车票3 是最大的开源微服务应用,由近 40 个微服务组成。假设每个微服务最多可以有 15 个副本,那么为该应用确定最优分配策略无疑是一个 NP 难问题,因为最多有 1540 种扩展选择。此外,在线优化中的反馈循环持续时间太长,无法实现模型收敛。此外,还必须考虑在线优化导致性能下降的潜在风险。图 3 展示了突发工作负载对 MicroScaler [15] 的副本波动和延迟波动的影响,MicroScaler 是一种在线自动缩放方法,采用了在线贝叶斯优化,以找到总成本的全局最小值。在线优化造成的频繁在线尝试创建副本(图 3a)导致了振荡和性能下降(图 3b)。因此,我们受到启发,设计了一种离线优化流程,由模拟器提供反馈。

本文介绍了 PBScaler,这是一个水平自动扩展框架,旨在通过识别和解决瓶颈问题,防止基于微服务的应用出现性能下降。我们没有像以前的工作[11]、[15]那样为所有异常微服务优化资源,而是提出了基于拓扑势理论(TPT)的随机行走算法 TopoRank,以识别性能瓶颈(PB)。TopoRank 考虑了微服务依赖性和异常可能性,提高了瓶颈定位的准确性和可解释性。通过 TopoRank 确定 PB 后,PBScaler 会进一步采用遗传算法找到近乎最优的策略。为避免过度优化造成应用振荡,该过程采用离线方式,并由 SLO 违反预测器提供指导,该预测器可模拟在线应用并为扩展策略提供反馈。本文的主要贡献概述如下:

  • 我们提出的 PBScaler 是一个瓶颈感知自动扩展框架,旨在防止基于微服务的应用程序性能下降。通过精确定位瓶颈,PBScaler 可以减少不必要的扩展并加快优化过程。
  • 我们采用基于遗传算法的离线优化流程来优化资源消耗,同时避免违反 SLO。该过程由 SLO 违规预测器指导,旨在实现资源消耗与性能之间的平衡,同时又不影响在线应用。
  • 我们在 Kubernetes 系统中设计并实施了 PBScaler。为了评估其有效性,我们在在线环境中运行的两个广泛使用的微服务系统上进行了大量的真实工作负载注入和模拟工作负载注入实验。实验结果表明,PBScaler 的性能优于几种最先进的弹性扩展方法。

本文接下来的内容安排如下。第 2 节,我们将讨论有关微服务瓶颈分析和自动扩展的相关工作。第 3 节,我们将详细介绍整个系统。第 4 节介绍评估和实验结果。第 5 节总结了我们的工作并讨论了未来的研究方向。

2 相关工作

随着云计算的发展,学术界和工业界提出了许多针对虚拟机或容器等云资源的自动扩展方法[16], [17], [18], [19]。然而,由于微服务之间错综复杂的依赖关系,微服务的自动伸缩可能要复杂得多。

性能瓶颈分析(也称为根本原因分析)是快速定位导致微服务性能下降的瓶颈的有效方法,从而减少自动扩展所需的时间和精力。在本节中,我们将分析有关微服务瓶颈分析和自动扩展的相关工作。

2.1 瓶颈分析

近年来,微服务场景中的瓶颈分析方法层出不穷,其中大多数都依赖于三类数据:日志、跟踪和度量。1) 日志。Jia 等人[20]和 Nandi 等人[21]首先从正常状态日志中提取模板和流量,将其与目标日志进行匹配,然后过滤掉异常日志。2) 跟踪。跟踪是一种基于事件跟踪的记录,它再现了微服务之间的请求过程。有几项研究[22]、[23]、[24]、[25]介绍了如何利用跟踪来找出瓶颈。Yu 等人[22]、[23]通过结合频谱分析和 PageRank 算法,在痕迹构建的依赖图上定位瓶颈,而 Mi 等人[24]提出了一种无监督机器学习原型,用于学习微服务的模式并过滤掉异常微服务。不过,使用痕迹可能会对代码造成干扰,而且要求操作员对微服务的结构有深入的了解。3) 指标。一些方法[2]、[26]、[27]利用图随机漫步算法来模拟异常的传播过程,然后通过整合指标的统计特征和微服务之间的依赖关系来找到瓶颈。此外,CauseInfer [14] 和 MicroCause [28] 等方法侧重于通过因果推理构建指标因果关系图,这通常涉及指标之间隐藏的间接关系。

由于在监控指标时很少修改工作流代码,因此为微服务收集指标的成本通常比使用跟踪更低。此外,使用指标作为主要监控数据可以降低整合瓶颈分析和自动扩展的成本,因为指标在后一种情况中被广泛使用。尽管这些方法都有优势,但大多数方法在选择异常回溯的起点时都没有偏好。相比之下,我们的方法从具有更大异常潜力的微服务开始随机回溯,从而加快了收敛速度,提高了瓶颈定位的准确性。

2.2 微服务的自动扩展 微服务的现有自动扩展方法可分为五类。1) 基于规则的启发式方法。KHPA、Libra[29]、KHPA-A[30]和 PEMA[31]根据资源阈值和特定规则管理微服务副本的数量。然而,由于不同的微服务对特定资源的敏感度不同,因此需要专家知识来支持这些不同微服务的自动扩展。2) 基于模型的方法。微服务可以通过建模来预测其在特定配置和工作负载下的状态。排队理论[32]、[33]和图神经网络(GNN)[12]常用于为微服务建立性能预测模型。3) 基于控制论的方法[11]、[32]。SHOWAR [11]利用控制论动态调整微服务副本,以纠正监控指标与阈值之间的误差。4) 基于优化的方法。这些方法[15]、[34]进行了大量尝试,以在现有资源和工作负载的情况下找到最优策略。这些方法的关键在于缩小决策范围以加快进程。5) 基于 RL 的方法。强化学习(RL)已广泛应用于微服务的资源管理。MIRAS [35] 采用基于模型的 RL 方法进行决策,以避免真实环境的高采样复杂性。FIRM[4]利用支持向量机(SVM)识别微服务中的关键路径,并利用深度确定性策略梯度(DDPG)算法为路径上的微服务制定混合扩展策略。基于 RL 的方法在探索过程中需要不断与环境交互,无法适应动态的微服务架构。总之,虽然上述自动扩展技术各有优势,但它们很少关注性能瓶颈。为无瓶颈的微服务消耗计算机资源势必会增加扩展成本,延长决策时间。而我们的方法则侧重于定位性能瓶颈。

3 SYSTEM DESIGN

我们提出的 PBScaler 是一种以 PB 为中心的自动扩展控制器,用于定位 PB 并为其优化副本。如图 4 所示,PBScaler 由三个部分组成: 1) 指标收集器: 为了实时了解应用程序的状态,我们设计了一个指标收集器,以固定的时间间隔捕获并整合来自 Prometheus4 的监控指标。2) 性能瓶颈分析: 在指标收集器的协助下,该组件可执行 SLO 违规检测和冗余检查,以识别存在异常行为的微服务。接下来,将触发瓶颈定位流程,找出异常微服务中的 PB。3) 扩展决策: 该组件旨在使用进化算法确定 PB 的最佳副本数量。最后,PBScaler 会生成具有优化策略的配置文件,并将其提交给 kubernetes-client5,从而控制微服务的副本数量。

3.1 指标收集器

自动伸缩系统依赖于对内存使用数据、系统负载和尾部延迟等指标的实时访问,以确定是否应执行弹性伸缩,以及应在微服务应用程序中分配多少资源。与需要深入了解程序和代码注入的基于跟踪的监控器[7]不同,指标收集器根据服务网格报告指标,以最大限度地减少对业务流的干扰。如表 1 所示,PBScaler 使用 Prometheus 和 kube-statemetrics 来收集和分类这些指标,包括响应延迟、微服务之间的调用关系、资源消耗和微服务工作量。
例如,容器 cpu 使用率秒总数是 Prometheus 中的一个标签,它记录了容器级别的中央处理器(CPU)使用情况。我们将 Prometheus 的监控间隔设为 5 秒,并将收集到的指标数据存储在时间序列数据库中。我们收集每个微服务的 P90 尾端延迟,并用它来表示应用程序的性能。调用关系意味着微服务之间的关联,可用于构建微服务关联图。服务网格。服务网格是一种基础设施,它能让开发人员为云应用添加可观察性和流量管理等高级功能,而不需要额外的代码。Istio7 是一种流行的开源服务网格实现,旨在与 Kubernetes 无缝集成。当 pod 在 Kubernetes 中启动时,Istio 会在 pod 中启动一个特使代理来拦截网络流量,从而实现工作负载平衡和监控。

image-20230806214133416

3.2 Performance Bottleneck Analysis

性能瓶颈分析(PBA)是一个旨在发现微服务应用程序中性能下降和资源浪费的过程,从而推断出当前问题的瓶颈。正如第 1 节所述,通过准确定位这些瓶颈,PBA 可以提高自动缩放的性能并减少过多的资源消耗。PBScaler 中的 PBA 流程如算法 1 所示。

3.2.1 SLO Violation Detection

为了检测微服务的异常情况,PBScaler 使用服务级别目标(SLO)与特定指标进行比较。如果微服务多次违反 SLO,即性能下降,就会被视为异常。正如文献[14]和[27]所述,检测 SLO 违规行为是触发瓶颈定位的关键一步。度量收集器收集的调用关系可用于构建微服务关联图 Gc。PBScaler 每 15 秒检查一次 Gc 中所有调用边的 P90 尾延迟,以便及时发现性能下降。如果某个调用的尾延迟超过了预定阈值(如 SLO),则该调用所调用的微服务将被添加到异常微服务(S)集合中,并启动瓶颈定位流程。为了考虑微服务延迟中偶尔出现的噪音,阈值设定为 SLO×(1 + α 2 ),其中 α 用于调整对噪音的容忍度。

3.2.2 Redundancy Checking

在没有出现性能异常的情况下,一些微服务可能会被分配超过所需的资源。但是,仅通过度量很难识别这种情况,可能会导致浪费有限的硬件资源。为了避免这种情况,必须确定哪些微服务分配了过多的资源。PBScaler 使用微服务每秒的工作负载变化率来确定资源是否多余。这种策略比仅依赖资源消耗更有效,因为不同的微服务对异构资源的敏感度可能不同。冗余检查的主要思路是利用假设检验来检测微服务当前的工作量 wi c 是否显著低于其过去的工作量(表示为 wi p)。显著程度由参数 β 调整,假设检验可表述为:

image-20230806214341493

为进行假设检验,我们首先从度量收集器中获取目标微服务的当前和历史工作量。如果 P 不超过置信水平 cl(默认设置为 0.05),我们就拒绝零假设 H0,并认为微服务 i 存在冗余资源。

image-20230806214622937

image-20230806214640121

image-20230806214731025

3.2.3 Bottleneck Localization

由于微服务应用程序中复杂的交互[36],并不是每个异常的微服务都需要扩展。例如,图5说明了瓶颈微服务(例如,Product)的性能下降如何沿着调用链传播到它的上游微服务(例如,recommendation, Frontend和Checkout),即使上游微服务没有过载。因此,只有瓶颈微服务必须被扩展,而其他异常微服务只是被牵连。为了精确定位瓶颈微服务,我们引入了异常潜力的概念,它聚集了给定位置上所有微服务的异常影响。由于PB被许多受其影响的异常微服务所包围,因此PB的异常潜力通常很高。我们设计了一种新的瓶颈定位算法TopoRank,该算法在随机行走中引入拓扑势理论(TPT)来计算所有异常微服务的得分,并最终输出一个排名列表(rl)。在rl中得分最高的微服务可以被识别为PBs。

TPT源于物理学中“场”的概念,在各种著作中被广泛应用[37],[38]来衡量复杂网络中节点之间的相互影响。由于微服务关联图也可以被视为一个复杂的网络,我们使用TPT来评估微服务的异常潜力。具体来说,我们已经观察到,在微服务关联图Gc中,离PBs更近的微服务,即那些跳数较少的微服务,更有可能出现异常,因为它们经常频繁地直接或间接调用PBs。基于这一观察,我们使用TPT评估微服务的异常潜力。为此,我们首先通过识别异常微服务及其在Gc中的调用关系来提取异常子图Ga。然后,我们使用TPT计算异常子图Ga中微服务vi的拓扑势。

image-20230806214903865

其中,N 代表 vi 的上游微服务数量,mj 代表 vj 的异常度。PBScaler 将异常度定义为某个微服务在某个时间窗口内违反 SLO 的次数。dji 表示从 vj 到 vi 所需的最小跳数。我们使用影响因子 σ 来控制微服务的影响范围。

然而,拓扑潜能值高的微服务并不一定是 PB,因为异常通常会沿着微服务相关图传播。因此,仅仅依靠 TPT 不足以诊断 PB。为了解决这个问题,PBScaler 采用了个性化 PageRank 算法 [39],以逆转异常子图 Ga 上的异常传播并定位 PB。假设 P 是 Ga 的转换矩阵,Pi,j 是异常从 vi 跟踪到其下游节点 vj 的概率。给定外度为 d 的 vi,标准个性化 PageRank 算法将 Pi,j 设为

image-20230806222701017

这意味着算法不会偏向任何下游微服务。然而,这一定义没有考虑到下游微服务与当前微服务异常之间的关联。因此,PBScaler 会调整计算方法,更多地关注指标与上游响应时间更相关的下游微服务。对于每个微服务 vi,PBScaler 都会收集一个尾部延迟序列(li)和一组度量数组 Mi = {m1, m2, - - , mk},其中 mk 可视为给定时间窗口内度量数组(如内存使用率)的波动数组。PBScaler 的定义是,Pi,j 取决于 li 与 Mj 中度量数组之间皮尔逊相关系数的最大值:

image-20230806222836755

个性化PageRank算法通过在有向图上随机行走来确定每个节点的受欢迎程度。然而,一些节点可能永远不会指向其他节点,导致所有节点的分数在多次迭代后趋于零。为了避免落入这个“陷阱”,应用了一个阻尼因子δ,它允许算法根据预定义的规则从这些节点跳出来。通常δ设为0.15。个性化PageRank表示如下:

image-20230806222945683

其中v表示每个微服务节点被诊断为PB的概率。偏好向量u作为个性化规则,引导算法跳出陷阱。u的值由每个节点的异常势决定。异常潜力较大的节点优先作为算法的起始点。第k次迭代的方程可以表示为:

image-20230806223009004

经过多轮迭代,v逐渐收敛。PBScaler然后对最终结果进行排序并生成排名列表。排名列表得分最高的前k个微服务可以被识别为PBs。TopoRank的整个过程描述为算法2。

3.3 Scaling Decision

给定性能瓶颈分析确定的PBs,将对PBs的副本进行缩放,以最小化应用程序的资源消耗,同时确保微服务的端到端延迟符合SLO。尽管大量的副本可以缓解性能下降问题,但它们也会消耗大量的资源。因此,必须在性能保证和资源消耗之间保持平衡。缩放决策的过程将被建模为一个约束优化问题,以实现这种平衡。

3.3.1 Constrained Optimization Model

我们场景中的自动缩放优化试图确定一个分配模式,该模式为每个PB分配可变数量的副本。给定n个需要缩放的PBs,我们将策略定义为集合X = {x1, x2,···,xn},其中xi表示分配给pbi的副本数量。在优化之前,PBs的初始副本数量可以表示为C = {c1, c2,···,cn}。应该注意的是,PBScaler中的副本约束应该分别为按比例缩小和按比例扩大的流程定义。在扩展过程中,我们对PBs的副本数量进行了如下限制:

![image-20230807104353754](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104353754.png)

其中cmax表示在给定有限服务器资源的情况下,微服务可以扩展到的最大副本数量。在缩小过程中,副本数量的约束可以表示为:

![image-20230807104407933](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104407933.png)

在Eq.(8)中,γ(默认值为2)表示复制减少的最大数量。这个限制是合理的,因为正如在实验中观察到的那样,大幅减少微服务副本的数量可能会导致短暂的延迟峰值。缩放决策的目标是尽量减少应用程序的资源消耗,同时保持其性能。应用程序性能通常用用户更关心的SLO违规来表示。因此,应用程序性能奖励可以细化为:

![image-20230807104422677](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104422677.png)

在优化过程中,应用程序的资源消耗(如CPU和内存使用)是不可预测的。为了保守估计资源消耗,

![image-20230807104445308](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104445308.png)

我们考虑的是PB副本与可分配副本的最大数量的比率,而不是计算CPU和内存的成本。我们将资源奖励计算为:

![image-20230807104458060](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104458060.png)

我们的目标是在保证性能的同时尽量减少资源消耗。我们利用加权线性组合(WLC)方法来平衡这两个目标。最终优化目标定义为:

![image-20230807104511351](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104511351.png)

式中λ∈[0,1]。我们将λ设置为参数,以平衡应用程序性能和资源消耗。

3.3.2 SLO Violation Predictor

为了计算性能奖励R1,需要评估策略是否会导致在线应用程序违反SLO。一种简单的方法是直接在在线申请中执行候选策略,并等待监控系统的反馈。然而,在线应用程序中频繁缩放引起的振荡是不可避免的。另一种方法是使用历史度量数据训练评估模型,该模型可以模拟在线应用的反馈。在不与在线应用程序交互的情况下,该模型根据应用程序的当前状态预测应用程序的性能。我们使用向量r来表示执行扩展策略x后每个微服务的副本数量。w是表示每个微服务当前工作负载的向量。由于瓶颈感知优化的时间成本较低,我们可以合理地假设w在此期间不会发生显著变化(参见4.2节)。给定由工作负载w和所有微服务副本r表示的应用状态,一个SLO违例预测器可以设计为:

![image-20230807104548241](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104548241.png)

其中ψ是一个二元分类模型。选择合适的分类模型的细节将在4.3节中讨论。用于训练的历史度量数据可以使用经典缩放方法(默认是Kubernetes自动缩放器)或随机方法生成。我们在3个节点(共44个CPU核和220gb RAM)上部署了一个开源微服务系统,并进行了弹性扩展。普罗米修斯以固定的时间间隔收集每个微服务的工作负载和P90尾部延迟。通过比较前端微服务的尾部延迟和SLO,可以很容易地标记每个时间间隔的监控数据。

![image-20230807104631365](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104631365.png)

3.3.3 Autocaling Optimization

如第3.3.1节所述,性能和资源消耗之间的权衡可以建模为约束优化问题。为了找到接近最优的策略,PBScaler使用遗传算法(GA)来生成和优化扩展策略,以减少资源消耗,同时满足SLO要求。遗传算法通过模拟进化中的自然选择,在淘汰劣等子代的同时提高优等子代。首先,遗传算法执行随机搜索来初始化策略空间中的染色体种群,每条染色体表示优化问题的潜在策略。接下来,在每次迭代中,将选择具有高适应度的精英染色体(称为精英)进行交叉或突变以产生下一代。

我们场景中的自动缩放优化旨在确定一种缩放策略,该策略为每个PB分配可变数量的副本。自缩放优化过程如图6所示。一开始,PBScaler获得每个微服务当前的副本数量r和工作负载w。在性能瓶颈分析之后,PBScaler从r中识别PBs,并将它们过滤出来,得到r ‘。然后,决策者生成PBs策略的总体。由于要扩展的微服务数量会影响优化算法的速度和效果(第4.3节),PBScaler假设只有PBs需要进行弹性扩展。换句话说,r ‘中的副本数量将保持不变。SLO违例预测器负责评估生成的策略。需要注意的是,该策略与r ‘合并,并与w一起输入到SLO违规预测器中。通过遗传算法选择优策略Xbest,并与r ‘合并,生成最终决策。

![image-20230807104720146](/Users/joshua/Library/Application Support/typora-user-images/image-20230807104720146.png)

从r中识别出PBs然后过滤出来得到r ‘然后,决策者生成PBs策略的总体。由于要扩展的微服务数量会影响优化算法的速度和效果(第4.3节),PBScaler假设只有PBs需要进行弹性扩展。换句话说,r ‘中的副本数量将保持不变。SLO违例预测器负责评估生成的策略。需要注意的是,该策略与r ‘合并,并与w一起输入到SLO违规预测器中。通过遗传算法选择优策略Xbest,并与r ‘合并,生成最终决策。在优化阶段,决策者使用遗传算法生成并改进PB的缩放策略,如算法3所述。在每个PB的策略范围内随机生成一个种群(第1行)后,Decision Maker根据Eq.(11)估计每个策略的适应度,并存储精英(第2-3行)。在每次迭代中,遗传算法使用基于锦标赛的选择算子来挑选优秀的亲本(第6行)。通过使用两个点交叉算子和双染色体突变算子,通过重组和突变(第7-8行)产生新的后代。通过模拟自然选择,新的后代和亲本精英形成一个新的种群,进入下一个迭代(第9行)。当遗传算法达到指定的迭代次数时,Decision Maker返回适应度最高的缩放决策Xbest。

4 EVALUATIONS

在本节中,我们详细介绍了自动缩放的实验场景,包括PBScaler与学术界和工业界几种最先进的自动缩放算法的比较。

4.1 Experimental

Setup 4.1.1 Microservice Platform

实验在我们的私有云集群中进行,该集群由三台物理计算机(一个主节点和两个工作节点)组成,共有44个Intel 2.40 GHz CPU内核和220gb RAM。为了评估自动扩展,我们选择了两个开源微服务应用程序作为基准:a)在线精品8,一个由谷歌开发的基于web的电子商务演示应用程序。该系统通过10个无状态微服务和Redis缓存的协作,实现了浏览产品、将商品添加到购物车和支付处理等基本功能。b) Train Ticket9:复旦大学开发的大型开源微服务系统。Train Ticket拥有40多个微服务,并使用MongoDB和MySQL进行数据存储,可以满足各种工作流程,如在线门票浏览,预订和购买。由于集群资源的限制,我们将每个微服务限制为不超过8个副本。源代码可在Github10上获得

4.1.2 Workload

我们评估了PBScaler在各种流量场景下的有效性,使用了2015年3月16日Wiki-Pageviews[40]的真实维基百科工作负载,以及受Abdullah等人[41]的实验启发的五种模拟工作负载(EW1 ~ EW5)。我们将实际工作负载压缩到一个小时,并将其扩展到适合我们集群的级别。五个模拟工作负载表现出不同的模式,例如单峰、多峰、上升和下降,并且持续时间限制为20分钟。图7描述了这些工作负载的波动情况。

4.1.3 Baseline Methods

我们将PBScaler与学术界和工业界的几种最先进的微服务自动扩展方法进行比较,这些方法从静态阈值、控制理论和黑盒优化的角度执行微服务的动态水平扩展。

•Kubernetes水平Pod自动缩放(KHPA):这是Kubernetes默认的水平缩放方案。通过为某个资源R定制阈值T(默认为CPU使用率),并从微服务的所有副本中聚合资源使用U R i, KHPA将副本的目标数量定义为n = d∑i∈ActiveP ods U R i / T e。

•MicroScaler[15]:这是一个自动扩展工具,它使用黑盒优化算法来确定微服务的最佳副本数量。MicroScaler计算微服务的P90/P50进行分类,然后执行贝叶斯优化的四次迭代来做出扩展决策。

•SHOWAR[11]:它是一种混合自动缩放技术。我们在SHOWAR中再现了水平缩放部分,它使用PID控制理论使观察到的度量逐渐接近用户指定的阈值。在我们的实现中,我们用更常见的P90延迟替换了运行队列延迟,因为前者需要额外的eBPF工具

4.1.4 Experimental Parameters and Evaluation Criteria

在我们的实验中,我们将普罗米修斯的收集间隔固定为5秒。随着实验时间和工作负载的增加,MongoDB等有状态微服务所需的数据量也会增加。最终,数据量将超过可用内存,从而需要使用磁盘存储。这种转换可能导致无法通过自动缩放来补救的性能下降。因此,我们将工作负载测试限制为无状态跟踪。在线精品店和火车票的SLO值分别设置为500毫秒和200毫秒。在SLO违例检测和冗余检查模块中,PBScaler首先将动作边界α设置为0.2,以减少噪声干扰。然后,我们根据经验将显著度β设置为0.9,以控制触发扩展的工作负载水平。对于瓶颈定位,将拓扑势的影响因子σ设为1,将rl中得分最高的top-k (k =2)个微服务视为PBs。我们选择了SLO违规率、资源消耗和响应时间来评估自缩放方法的性能。如果自动缩放方法可以减少响应时间、SLO违规率和资源消耗,则认为它更有效。我们将SLO违例率定义为端到端P90尾部延迟超过SLO的百分比。资源消耗按照[42]给出的方法计算,其中CPU价格为0.00003334$ (vCPU/s),内存价格为0.00001389$ (G/s)。总资源消耗由内存成本和CPU成本相加得到。

![image-20230807105023233](/Users/joshua/Library/Application Support/typora-user-images/image-20230807105023233.png)

![image-20230807105039422](/Users/joshua/Library/Application Support/typora-user-images/image-20230807105039422.png)

4.2 Performance Evaluation

表2比较了具有不同工作负载的两个微服务应用程序中四种自动伸缩方法的SLO违反率和资源成本。None方法用作引用,不执行自动缩放操作。其结果以灰色表示,并被排除在比较之外。一般来说,PBScaler在减少两个微服务系统中六个工作负载下的SLO违规和最小化资源开销方面优于其他竞争方法。其中,PBScaler在火车票中的SLO违规率比基线方法平均降低4.96%,资源成本平均降低0.24美元。结果表明,PBScaler可以快速、精确地对大规模微服务系统中的瓶颈微服务进行弹性扩展,从而减少了SLO违规,节省了资源。对于Online Boutique中的6个工作负载,PBScaler还在其中4个模拟工作负载中实现了最低的SLO违规,并在3个模拟工作负载中实现了最低的资源消耗。图8描绘了六种工作负载下不同方法的延迟分布箱形图,探讨了每种方法对微服务系统性能的影响。可以看出,大多数自动缩放方法都可以保持延迟分布的中位数低于红色虚线(SLO)。但是,只有PBScaler进一步将第三个四分位数降低到所有工作负载的SLO以下。为了评估使用PBScaler进行弹性缩放的时间成本,收集并计算了PBScaler中每个模块所需的平均时间。如表3所示,Online Boutique中所有PBScaler模块的总时间成本小于一个监控间隔(即5s),而Train Ticket的相同度量小于两个监控间隔。由于PBA缩小了决策范围,当应用程序从Online Boutique切换到Train Ticket时,尽管微服务的数量增加了,但决策者的时间成本并没有增加太多(不超过6.6%)。然而,我们认识到随着微服务规模的增长,PBA的时间消耗会迅速增加,这将是我们未来的工作。

4.3 Effectiveness Analysis of Components
4.3.1 Performance Comparision of Bottleneck Localization

![image-20230807105214554](/Users/joshua/Library/Application Support/typora-user-images/image-20230807105214554.png)

![image-20230807105227225](/Users/joshua/Library/Application Support/typora-user-images/image-20230807105227225.png)

为了评估TopoRank算法是否能有效定位突发工作负载引起的PBs,我们通过Chaos Mesh将CPU过载、内存溢出和网络拥塞等异常注入到Online Boutique和Train Ticket中。这些异常通常是由高工作负载条件引起的。使用TopoRank算法分析度量并确定这些异常的性能瓶颈。然后将定位结果与micrororca[27]进行比较,micrororca是微服务根本原因分析的基线方法。AC@k测量前k个结果中实际PBs的精度,Avg@k是前k个结果的平均精度。这些指标的计算方法如下。

![image-20230807105148687](/Users/joshua/Library/Application Support/typora-user-images/image-20230807105148687.png)

其中A表示微服务集合,RT @k表示排名前k的微服务。图9给出了TopoRank和MicroRCA在不同微服务应用中的AC@1和Avg@5值。结果表明,TopoRank在这两个指标上都优于MicroRCA。这主要是由于TopoRank在执行个性化PageRank时考虑了异常可能性和微服务依赖关系。

瓶颈定位的主要目的是缩小策略空间,加快最优策略的发现。我们在PBs和所有微服务上执行遗传算法迭代,以证明瓶颈定位对优化的影响。图10描述了微服务系统下的迭代过程,并表明随着人口(Pop)的增加,pb感知策略在适应度方面明显优于适用于所有微服务的方法。该策略可以在不到5次迭代中获得较好的适应度。相比之下,涉及所有微服务的方法需要更大的种群和更多的迭代才能达到相同的适应度水平。这是由于pb感知策略帮助遗传算法精确地缩小了优化范围,加速了优解的获取。

4.3.2 Effectiveness of the SLO Violation Predictor

SLO违例预测器的目标是直接预测优化策略的结果,而不是等待在线应用程序的反馈。我们根据每个微服务的副本数量和工作负载来确定是否会出现性能问题。为预测任务选择合适的二分类模型至关重要。以5秒的数据收集间隔,我们在我们的集群中收集了两个数据集,包括3.1k的火车票历史采样数据集(a)和1.5k的在线精品数据集(B)。为了对这两个数据集进行训练和测试,我们采用了四种经典的机器学习方法,包括支持向量机(SVM)、随机森林(Random Forest)、多层感知器(Multilayer Perceptron)和决策树(Decision Tree)。表4给出了四种模型对SLO违规预测的准确率和召回率。根据两个数据集的效果,我们最终选择随机森林作为SLO违例预测的主要算法。为了证明SLO违例预测器可以替代来自真实环境的反馈,我们将使用SLO违例预测器的PBScaler与从在线系统收集反馈的MicroScaler进行了比较。我们将突发工作负载注入Online Boutique,并仅使一个微服务异常,以消除两种方法在瓶颈定位方面的差异。如图11所示,在预测器的引导下,PBScaler的决策尝试次数和频率远低于MicroScaler。减少集群中的在线尝试将明显降低振荡的风险。

5 CONCLUSIONS

本文介绍了PBScaler,一个瓶颈感知自动伸缩框架,旨在防止基于微服务的应用程序的性能退化。PBScaler使用服务网格技术收集应用程序的实时性能指标,并动态构建微服务之间的关联图。为了处理由外部动态工作负载和微服务之间复杂调用引起的微服务异常,PBScaler采用基于拓扑势理论的随机游动算法TopoRank来识别瓶颈微服务。此外,PBScaler采用离线进化算法,在SLO违规预测器的指导下优化缩放策略。实验结果表明,PBScaler可以在最小化资源消耗的同时实现较低的SLO违规。

在未来,我们计划从以下两个方面改进我们的工作。首先,我们将探索在细粒度资源(例如,CPU和内存)管理中使用瓶颈感知的潜力。其次,我们将探讨如何规避自扩展中有状态微服务的干扰,因为有状态微服务的性能下降可能会破坏自扩展控制器。第三,提高大规模微服务系统性能瓶颈分析的效率。

docker

Docker

docker容器使用

Pull images

docker pull 可以来载入镜像

Run containers

docker run -it ubuntu /bin/bash

参数说明:

  • -i: 交互式操作。
  • -t: 终端。
  • ubuntu: ubuntu 镜像。
  • /bin/bash:放在镜像名后的是命令,这里我们希望有个交互式 Shell,因此用的是 /bin/bash。

Exit terminal

exit

查看所有的容器命令如下:

docker ps -a

然后可以用

docker start来启动已经关闭的容器

docker stop来关闭容器

docker restart

进入容器

在使用 -d 参数时,容器启动后会进入后台。此时想要进入容器,可以通过以下指令进入:

  • docker attach
  • docker exec:推荐大家使用 docker exec 命令,因为此命令会退出容器终端,但不会导致容器的停止。

attach 命令,如果从这个容器退出,会导致容器的停止。

导出或者导入容器

导出容器

如果要导出本地某个容器,可以使用 docker export 命令。

1
docker export 1e560fca3906 > ubuntu.tar

导入容器快照

可以使用 docker import 从容器快照文件中再导入为镜像,以下实例将快照文件 ubuntu.tar 导入到镜像 test/ubuntu:v1:

1
cat docker/ubuntu.tar | docker import - test/ubuntu:v1

此外,也可以通过指定 URL 或者某个目录来导入,例如:

1
docker import http://example.com/exampleimage.tgz example/imagerep

删除容器

删除容器使用 docker rm 命令:

1
docker rm -f 1e560fca3906

下面的命令可以清理掉所有处于终止状态的容器。

1
docker container prune

运行web

前面我们运行的容器并没有一些什么特别的用处。

接下来让我们尝试使用 docker 构建一个 web 应用程序。

我们将在docker容器中运行一个 Python Flask 应用来运行一个web应用。

1
2
docker pull training/webapp  # 载入镜像
docker run -d -P training/webapp python app.py
  • **-d:**让容器在后台运行。
  • **-P:**将容器内部使用的网络端口随机映射到我们使用的主机上。

端口信息

1
2
PORTS
0.0.0.0:32769->5000/tcp

我们也可以通过 -p 参数来设置不一样的端口

1
$ docker run -d -p 5000:5000 training/webapp python app.py

网络端口的快捷方式

通过 docker ps 命令可以查看到容器的端口映射,docker 还提供了另一个快捷方式 docker port,使用 docker port 可以查看指定 (ID 或者名字)容器的某个确定端口映射到宿主机的端口号。

上面我们创建的 web 应用容器 ID 为 bf08b7f2cd89 名字为 wizardly_chandrasekhar

我可以使用 docker port bf08b7f2cd89docker port wizardly_chandrasekhar 来查看容器端口的映射情况。

1
2
3
4
5
$ docker port bf08b7f2cd89
5000/tcp -> 0.0.0.0:5000

$ docker port wizardly_chandrasekhar
5000/tcp -> 0.0.0.0:5000

查看 WEB 应用程序日志

docker logs [ID或者名字] 可以查看容器内部的标准输出。

1
2
3
4
runoob@runoob:~$ docker logs -f bf08b7f2cd89
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
192.168.239.1 - - [09/May/2016 16:30:37] "GET / HTTP/1.1" 200 -
192.168.239.1 - - [09/May/2016 16:30:37] "GET /favicon.ico HTTP/1.1" 404 -

-f:docker logs 像使用 tail -f 一样来输出容器内部的标准输出。

从上面,我们可以看到应用程序使用的是 5000 端口并且能够查看到应用程序的访问日志。

查看WEB应用程序容器的进程

我们还可以使用 docker top 来查看容器内部运行的进程

1
2
3
runoob@runoob:~$ docker top wizardly_chandrasekhar
UID PID PPID ... TIME CMD
root 23245 23228 ... 00:00:00 python app.py

检查 WEB 应用程序

使用 docker inspect 来查看 Docker 的底层信息。它会返回一个 JSON 文件记录着 Docker 容器的配置和状态信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
runoob@runoob:~$ docker inspect wizardly_chandrasekhar
[
{
"Id": "bf08b7f2cd897b5964943134aa6d373e355c286db9b9885b1f60b6e8f82b2b85",
"Created": "2018-09-17T01:41:26.174228707Z",
"Path": "python",
"Args": [
"app.py"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,9
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 23245,
"ExitCode": 0,
"Error": "",
"StartedAt": "2018-09-17T01:41:26.494185806Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
......

重启WEB应用容器

已经停止的容器,我们可以使用命令 docker start 来启动。

1
2
runoob@runoob:~$ docker start wizardly_chandrasekhar
wizardly_chandrasekhar

docker ps -l 查询最后一次创建的容器:

1
2
3
#  docker ps -l 
CONTAINER ID IMAGE PORTS NAMES
bf08b7f2cd89 training/webapp ... 0.0.0.0:5000->5000/tcp wizardly_chandrasekhar

正在运行的容器,我们可以使用 docker restart 命令来重启。


移除WEB应用容器

我们可以使用 docker rm 命令来删除不需要的容器

1
2
runoob@runoob:~$ docker rm wizardly_chandrasekhar  
wizardly_chandrasekhar

删除容器时,容器必须是停止状态,否则会报如下错误

1
2
runoob@runoob:~$ docker rm wizardly_chandrasekhar
Error response from daemon: You cannot remove a running container bf08b7f2cd897b5964943134aa6d373e355c286db9b9885b1f60b6e8f82b2b85. Stop the container before attempting removal or force remove

Docker镜像

docker images

各个选项说明:

  • REPOSITORY:表示镜像的仓库源
  • TAG:镜像的标签
  • IMAGE ID:镜像ID
  • CREATED:镜像创建时间
  • SIZE:镜像大小

同一仓库源可以有多个 TAG,代表这个仓库源的不同个版本,如 ubuntu 仓库源里,有 15.10、14.04 等多个不同的版本,我们使用 REPOSITORY:TAG 来定义不同的镜像。

所以,我们如果要使用版本为15.10的ubuntu系统镜像来运行容器时,命令如下:

1
docker run -t -i ubuntu:15.10 /bin/bash 

如果你不指定一个镜像的版本标签,例如你只使用 ubuntu,docker 将默认使用 ubuntu:latest 镜像。

获取一个新的镜像

当我们在本地主机上使用一个不存在的镜像时 Docker 就会自动下载这个镜像。如果我们想预先下载这个镜像,我们可以使用 docker pull 命令来下载它。

1
2
3
4
5
6
7
8
9
Crunoob@runoob:~$ docker pull ubuntu:13.10
13.10: Pulling from library/ubuntu
6599cadaf950: Pull complete
23eda618d451: Pull complete
f0be3084efe9: Pull complete
52de432f084b: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:15b79a6654811c8d992ebacdfbd5152fcf3d165e374e264076aa435214a947a3
Status: Downloaded newer image for ubuntu:13.10

下载完成后,我们可以直接使用这个镜像来运行容器。

查找镜像

我们可以从 Docker Hub 网站来搜索镜像,Docker Hub 网址为: https://hub.docker.com/

我们也可以使用 docker search 命令来搜索镜像。比如我们需要一个 httpd 的镜像来作为我们的 web 服务。我们可以通过 docker search 命令搜索 httpd 来寻找适合我们的镜像。

1
docker search httpd

NAME: 镜像仓库源的名称

DESCRIPTION: 镜像的描述

OFFICIAL: 是否 docker 官方发布

stars: 类似 Github 里面的 star,表示点赞、喜欢的意思。

AUTOMATED: 自动构建。

拖取镜像

我们决定使用上图中的 httpd 官方版本的镜像,使用命令 docker pull 来下载镜像。

1
2
3
4
5
6
7
8
9
runoob@runoob:~$ docker pull httpd
Using default tag: latest
latest: Pulling from library/httpd
8b87079b7a06: Pulling fs layer
a3ed95caeb02: Download complete
0d62ec9c6a76: Download complete
a329d50397b9: Download complete
ea7c1f032b5c: Waiting
be44112b72c7: Waiting

下载完成后,我们就可以使用这个镜像了。

1
runoob@runoob:~$ docker run httpd

删除镜像

镜像删除使用 docker rmi 命令,比如我们删除 hello-world 镜像:

1
$ docker rmi hello-world

当我们从 docker 镜像仓库中下载的镜像不能满足我们的需求时,我们可以通过以下两种方式对镜像进行更改。

  • 1、从已经创建的容器中更新镜像,并且提交这个镜像
  • 2、使用 Dockerfile 指令来创建一个新的镜像

更新镜像

更新镜像之前,我们需要使用镜像来创建一个容器。

1
2
runoob@runoob:~$ docker run -t -i ubuntu:15.10 /bin/bash
root@e218edb10161:/#

在运行的容器内使用 apt-get update 命令进行更新。

在完成操作之后,输入 exit 命令来退出这个容器。

此时 ID 为 e218edb10161 的容器,是按我们的需求更改的容器。我们可以通过命令 docker commit 来提交容器副本。

1
2
docker commit -m="has update" -a="runoob" e218edb10161 runoob/ubuntu:v2
sha256:70bf1840fd7c0d2d8ef0a42a817eb29f854c1af8f7c59fc03ac7bdee9545aff8

各个参数说明:

  • -m: 提交的描述信息
  • -a: 指定镜像作者
  • e218edb10161:容器 ID
  • runoob/ubuntu:v2: 指定要创建的目标镜像名

我们可以使用 docker images 命令来查看我们的新镜像 runoob/ubuntu:v2

1
2
3
4
5
6
7
8
9
10
11
runoob@runoob:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
runoob/ubuntu v2 70bf1840fd7c 15 seconds ago 158.5 MB
ubuntu 14.04 90d5884b1ee0 5 days ago 188 MB
php 5.6 f40e9e0f10c8 9 days ago 444.8 MB
nginx latest 6f8d099c3adc 12 days ago 182.7 MB
mysql 5.6 f2e8d6c772c0 3 weeks ago 324.6 MB
httpd latest 02ef73cf1bc0 3 weeks ago 194.4 MB
ubuntu 15.10 4e3b13c8a266 4 weeks ago 136.3 MB
hello-world latest 690ed74de00f 6 months ago 960 B
training/webapp latest 6fae60ef3446 12 months ago 348.8 MB

使用我们的新镜像 runoob/ubuntu 来启动一个容器

1
2
runoob@runoob:~$ docker run -t -i runoob/ubuntu:v2 /bin/bash                            
root@1a9fbdeb5da3:/#

构建镜像

我们使用命令 docker build , 从零开始来创建一个新的镜像。为此,我们需要创建一个 Dockerfile 文件,其中包含一组指令来告诉 Docker 如何构建我们的镜像。

1
2
3
4
5
6
7
8
9
10
11
runoob@runoob:~$ cat Dockerfile 
FROM centos:6.7
MAINTAINER Fisher "fisher@sudops.com"

RUN /bin/echo 'root:123456' |chpasswd
RUN useradd runoob
RUN /bin/echo 'runoob:123456' |chpasswd
RUN /bin/echo -e "LANG=\"en_US.UTF-8\"" >/etc/default/local
EXPOSE 22
EXPOSE 80
CMD /usr/sbin/sshd -D

每一个指令都会在镜像上创建一个新的层,每一个指令的前缀都必须是大写的。

第一条FROM,指定使用哪个镜像源

RUN 指令告诉docker 在镜像内执行命令,安装了什么。。。

然后,我们使用 Dockerfile 文件,通过 docker build 命令来构建一个镜像。

1
2
3
4
5
6
7
8
9
10
11
12
runoob@runoob:~$ docker build -t runoob/centos:6.7 .
Sending build context to Docker daemon 17.92 kB
Step 1 : FROM centos:6.7
---&gt; d95b5ca17cc3
Step 2 : MAINTAINER Fisher "fisher@sudops.com"
---&gt; Using cache
---&gt; 0c92299c6f03
Step 3 : RUN /bin/echo 'root:123456' |chpasswd
---&gt; Using cache
---&gt; 0397ce2fbd0a
Step 4 : RUN useradd runoob
......

参数说明:

  • -t :指定要创建的目标镜像名
  • . :Dockerfile 文件所在目录,可以指定Dockerfile 的绝对路径

使用docker images 查看创建的镜像已经在列表中存在,镜像ID为860c279d2fec

我们可以使用新的镜像来创建容器

1
2
3
runoob@runoob:~$ docker run -t -i runoob/centos:6.7  /bin/bash
[root@41c28d18b5fb /]# id runoob
uid=500(runoob) gid=500(runoob) groups=500(runoob)

从上面看到新镜像已经包含我们创建的用户 runoob。

设置镜像标签

我们可以使用 docker tag 命令,为镜像添加一个新的标签。

1
runoob@runoob:~$ docker tag 860c279d2fec runoob/centos:dev

docker tag 镜像ID,这里是 860c279d2fec ,用户名称、镜像源名(repository name)和新的标签名(tag)。

使用 docker images 命令可以看到,ID为860c279d2fec的镜像多一个标签。

Docker 容器连接

让外部也可以访问这些应用,可以通过 -P-p 参数来指定端口映射。

网络端口映射

我们创建了一个 python 应用的容器。

1
2
runoob@runoob:~$ docker run -d -P training/webapp python app.py
fce072cc88cee71b1cdceb57c2821d054a4a59f67da6b416fceb5593f059fc6d

另外,我们可以指定容器绑定的网络地址,比如绑定 127.0.0.1。

我们使用 -P 绑定端口号,使用 docker ps 可以看到容器端口 5000 绑定主机端口 32768。

1
2
3
runoob@runoob:~$ docker ps
CONTAINER ID IMAGE COMMAND ... PORTS NAMES
fce072cc88ce training/webapp "python app.py" ... 0.0.0.0:32768->5000/tcp grave_hopper

我们也可以使用 -p 标识来指定容器端口绑定到主机端口。

两种方式的区别是:

  • -P :是容器内部端口随机映射到主机的端口。
  • -p : 是容器内部端口绑定到指定的主机端口。(上面的-P的就是随机分配的32768端口,而下面的则分配了主机的5000端口)
1
runoob@runoob:~$ docker run -d -p 5000:5000 training/webapp python app.py

另外,我们可以指定容器绑定的网络地址,比如绑定 127.0.0.1。

1
docker run -d -p 127.0.0.1:5001:5000 training/webapp python app.py

这样我们就可以通过访问 127.0.0.1:5001 来访问容器的 5000 端口。

上面的例子中,默认都是绑定 tcp 端口,如果要绑定 UDP 端口,可以在端口后面加上 /udp

1
docker run -d -p 127.0.0.1:5000:5000/udp training/webapp python app.py

docker port 命令可以让我们快捷地查看端口的绑定情况。

1
runoob@runoob:~$ docker port adoring_stonebraker 5000

Docker 容器互联

端口映射并不是唯一把 docker 连接到另一个容器的方法。

docker 有一个连接系统允许将多个容器连接在一起,共享连接信息。

docker 连接会创建一个父子关系,其中父容器可以看到子容器的信息。

容器命名

当我们创建一个容器的时候,docker 会自动对它进行命名。另外,我们也可以使用 –name 标识来命名容器,例如:

1
runoob@runoob:~$  docker run -d -P --name runoob training/webapp python app.py

新建网络

下面先创建一个新的 Docker 网络。

1
$ docker network create -d bridge test-net

参数说明:

-d:参数指定 Docker 网络类型,有 bridge、overlay。

其中 overlay 网络类型用于 Swarm mode,在本小节中你可以忽略它。

连接容器

运行一个容器并连接到新建的 test-net 网络:

1
$ docker run -itd --name test1 --network test-net ubuntu /bin/bash

打开新的终端,再运行一个容器并加入到 test-net 网络:

1
$ docker run -itd --name test2 --network test-net ubuntu /bin/bash

用ping命令证明test1和test2都加入了test-net中。

如果 test1、test2 容器内中无 ping 命令,则在容器内执行以下命令安装 ping(即学即用:可以在一个容器里安装好,提交容器到镜像,在以新的镜像重新运行以上俩个容器)。

1
2
apt-get update
apt install iputils-ping

docker-net3

这里在test1中可以ping通test2

这样,test1 容器和 test2 容器建立了互联关系。

如果你有多个容器之间需要互相连接,推荐使用 Docker Compose.

配置 DNS

我们可以在宿主机的 /etc/docker/daemon.json 文件中增加以下内容来设置全部容器的 DNS:

1
2
3
4
5
6
{
"dns" : [
"114.114.114.114",
"8.8.8.8"
]
}

设置后,启动容器的 DNS 会自动配置为 114.114.114.114 和 8.8.8.8。

配置完,需要重启 docker 才能生效。

查看容器的 DNS 是否生效可以使用以下命令,它会输出容器的 DNS 信息:

1
$ docker run -it --rm  ubuntu  cat etc/resolv.conf

手动指定容器的配置

如果只想在指定的容器设置 DNS,则可以使用以下命令:

1
$ docker run -it --rm -h host_ubuntu  --dns=114.114.114.114 --dns-search=test.com ubuntu

这里的参数说明一下:

–rm:容器退出时自动清理容器内部的文件系统。

-h HOSTNAME 或者 –hostname=HOSTNAME: 设定容器的主机名,它会被写到容器内的 /etc/hostname 和 /etc/hosts。

–dns=IP_ADDRESS: 添加 DNS 服务器到容器的 /etc/resolv.conf 中,让容器用这个服务器来解析所有不在 /etc/hosts 中的主机名。

–dns-search=DOMAIN: 设定容器的搜索域,当设定搜索域为 .example.com 时,在搜索一个名为 host 的主机时,DNS 不仅搜索 host,还会搜索 host.example.com。

如果在容器启动时没有指定 –dns–dns-search,Docker 会默认用宿主主机上的 /etc/resolv.conf 来配置容器的 DNS。

Docker Repository仓库管理

Repo contains images.

Docker Hub is a offical repo which is managed by docker.

拉取镜像

你可以通过 docker search 命令来查找官方仓库中的镜像,并利用 docker pull 命令来将它下载到本地。

比如你要搜索ubuntu, 你就可以使用以下的命令

1
$ docker search ubuntu

image-20230618135435373

然后你就可以使用docker pull ubuntu命令

推送镜像

在用户登陆docker hub之后,你也可以推送自己的镜像到docker hub,使用docker push命令

以下命令中的 username 请替换为你的 Docker 账号用户名。

1
2
3
4
5
6
7
8
9
10
11
$ docker tag ubuntu:18.04 username/ubuntu:18.04
$ docker image ls

REPOSITORY TAG IMAGE ID CREATED ...
ubuntu 18.04 275d79972a86 6 days ago ...
username/ubuntu 18.04 275d79972a86 6 days ago ...
$ docker push username/ubuntu:18.04
$ docker search username/ubuntu

NAME DESCRIPTION STARS OFFICIAL AUTOMATED
username/ubuntu

Docker Dockerfile

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Essentially, Dockerfiles are sets of instructions that Docker uses to build a Docker image automatically.

使用dockerfile来定制镜像

以定制一个nginx镜像为例子,构建好的镜像内会有一个 /usr/share/nginx/html/index.html 文件

1.在一个空目录下,新建一个名为 Dockerfile 文件,并在文件内添加以下内容:

1
2
FROM nginx
RUN echo '这是一个本地构建的nginx镜像' > /usr/share/nginx/html/index.html

2.From和Run指令的作用

FROM:定制的镜像都是基于 FROM 的镜像,这里的 nginx 就是定制需要的基础镜像。后续的操作都是基于 nginx。

RUN:用于执行后面跟着的命令行命令。有以下俩种格式:

shell格式:

1
2
RUN <命令行命令>
# <命令行命令> 等同于,在终端操作的 shell 命令。

exec 格式:

1
2
3
RUN ["可执行文件", "参数1", "参数2"]
# 例如:
# RUN ["./test.php", "dev", "offline"] 等价于 RUN ./test.php dev offline

注意:Dockerfile 的指令每执行一次都会在 docker 上新建一层。所以过多无意义的层,会造成镜像膨胀过大。例如:

FROM centos
RUN yum -y install wget
RUN wget -O redis.tar.gz “http://download.redis.io/releases/redis-5.0.3.tar.gz"
RUN tar -xvf redis.tar.gz

以上执行会创建 3 层镜像。可简化为以下格式:

FROM centos
RUN yum -y install wget
&& wget -O redis.tar.gz “http://download.redis.io/releases/redis-5.0.3.tar.gz"
&& tar -xvf redis.tar.gz

如上,以 && 符号连接命令,这样执行后,只会创建 1 层镜像。

开始构建镜像

在 Dockerfile 文件的存放目录下,执行构建动作

示例,通过目录下的 Dockerfile 构建一个 nginx:v3 (镜像名称:镜像标签)。

1
$ docker build -t nginx:v3 .

The . specifies the build context or the path to the directory containing the Dockerfile.

dockerfile2

以上显示,说明已经构建成功。

上下文路径

. 是上下文路径,那么什么是上下文路径呢?

上下文路径,是指 docker 在构建镜像,有时候想要使用到本机的文件(比如复制),docker build 命令得知这个路径后,会将路径下的所有内容打包。

解析:由于 docker 的运行模式是 C/S。我们本机是 C,docker 引擎是 S。实际的构建过程是在 docker 引擎下完成的,所以这个时候无法用到我们本机的文件。这就需要把我们本机的指定目录下的文件一起打包提供给 docker 引擎使用。

如果未说明最后一个参数,那么默认上下文路径就是 Dockerfile 所在的位置。

注意:上下文路径下不要放无用的文件,因为会一起打包发送给 docker 引擎,如果文件过多会造成过程缓慢。

dockerfile的指令详解

简洁版

  • FROM

构建镜像基于哪个镜像

  • MAINTAINER

镜像维护者姓名或邮箱地址

  • RUN

构建镜像时运行的指令

  • CMD

运行容器时执行的shell环境

  • VOLUME

指定容器挂载点到宿主机自动生成的目录或其他容器

  • USER

为RUN、CMD、和 ENTRYPOINT 执行命令指定运行用户

  • WORKDIR

为 RUN、CMD、ENTRYPOINT、COPY 和 ADD 设置工作目录,就是切换目录

  • HEALTHCHECH

健康检查

  • ARG

构建时指定的一些参数

  • EXPOSE

声明容器的服务端口(仅仅是声明)

  • ENV

设置容器环境变量

  • ADD

拷贝文件或目录到容器中,如果是URL或压缩包便会自动下载或自动解压

  • COPY

拷贝文件或目录到容器中,跟ADD类似,但不具备自动下载或解压的功能

  • ENTRYPOINT

运行容器时执行的shell命令


详细版

COPY

复制指令,从上下文目录中复制文件或者目录到容器里指定路径。

格式:

1
2
COPY [--chown=<user>:<group>] <源路径1>...  <目标路径>
COPY [--chown=<user>:<group>] ["<源路径1>",... "<目标路径>"]

**[–chown=:]**:可选参数,用户改变复制到容器内文件的拥有者和属组。

**<源路径>**:源文件或者源目录,这里可以是通配符表达式,其通配符规则要满足 Go 的 filepath.Match 规则。例如:

1
2
COPY hom* /mydir/
COPY hom?.txt /mydir/

**<目标路径>**:容器内的指定路径,该路径不用事先建好,路径不存在的话,会自动创建。

ADD

ADD 指令和 COPY 的使用格类似(同样需求下,官方推荐使用 COPY)。功能也类似,不同之处如下:

  • ADD 的优点:在执行 <源文件> 为 tar 压缩文件的话,压缩格式为 gzip, bzip2 以及 xz 的情况下,会自动复制并解压到 <目标路径>。
  • ADD 的缺点:在不解压的前提下,无法复制 tar 压缩文件。会令镜像构建缓存失效,从而可能会令镜像构建变得比较缓慢。具体是否使用,可以根据是否需要自动解压来决定。

CMD

类似于 RUN 指令,用于运行程序,但二者运行的时间点不同:

  • CMD 在docker run 时运行。
  • RUN 是在 docker build。

作用:为启动的容器指定默认要运行的程序,程序运行结束,容器也就结束。CMD 指令指定的程序可被 docker run 命令行参数中指定要运行的程序所覆盖。

注意:如果 Dockerfile 中如果存在多个 CMD 指令,仅最后一个生效。

格式:

1
2
3
CMD <shell 命令> 
CMD ["<可执行文件或命令>","<param1>","<param2>",...]
CMD ["<param1>","<param2>",...] # 该写法是为 ENTRYPOINT 指令指定的程序提供默认参数

推荐使用第二种格式,执行过程比较明确。第一种格式实际上在运行的过程中也会自动转换成第二种格式运行,并且默认可执行文件是 sh。

ENTRYPOINT

类似于 CMD 指令,但其不会被 docker run 的命令行参数指定的指令所覆盖,而且这些命令行参数会被当作参数送给 ENTRYPOINT 指令指定的程序。

但是, 如果运行 docker run 时使用了 –entrypoint 选项,将覆盖 ENTRYPOINT 指令指定的程序。

优点:在执行 docker run 的时候可以指定 ENTRYPOINT 运行所需的参数。

注意:如果 Dockerfile 中如果存在多个 ENTRYPOINT 指令,仅最后一个生效。

格式:

1
ENTRYPOINT ["<executeable>","<param1>","<param2>",...]

可以搭配 CMD 命令使用:一般是变参才会使用 CMD ,这里的 CMD 等于是在给 ENTRYPOINT 传参,以下示例会提到。

示例:

假设已通过 Dockerfile 构建了 nginx:test 镜像:

1
2
3
4
FROM nginx

ENTRYPOINT ["nginx", "-c"] # 定参
CMD ["/etc/nginx/nginx.conf"] # 变参

1、不传参运行

1
$ docker run  nginx:test

容器内会默认运行以下命令,启动主进程。

1
nginx -c /etc/nginx/nginx.conf

2、传参运行

1
$ docker run  nginx:test -c /etc/nginx/new.conf

容器内会默认运行以下命令,启动主进程(/etc/nginx/new.conf:假设容器内已有此文件)

1
nginx -c /etc/nginx/new.conf

ENV

设置环境变量,定义了环境变量,那么在后续的指令中,就可以使用这个环境变量。

格式:

1
2
ENV <key> <value>
ENV <key1>=<value1> <key2>=<value2>...

以下示例设置 NODE_VERSION = 7.2.0 , 在后续的指令中可以通过 $NODE_VERSION 引用:

1
2
3
4
ENV NODE_VERSION 7.2.0

RUN curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-x64.tar.xz" \
&& curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc"

ARG

构建参数,与 ENV 作用一致。不过作用域不一样。ARG 设置的环境变量仅对 Dockerfile 内有效,也就是说只有 docker build 的过程中有效,构建好的镜像内不存在此环境变量。

构建命令 docker build 中可以用 –build-arg <参数名>=<值> 来覆盖。

格式:

1
ARG <参数名>[=<默认值>]

VOLUME

定义匿名数据卷。在启动容器时忘记挂载数据卷,会自动挂载到匿名卷。

作用:

  • 避免重要的数据,因容器重启而丢失,这是非常致命的。
  • 避免容器不断变大。

格式:

1
2
VOLUME ["<路径1>", "<路径2>"...]
VOLUME <路径>

在启动容器 docker run 的时候,我们可以通过 -v 参数修改挂载点。

EXPOSE

仅仅只是声明端口。

作用:

  • 帮助镜像使用者理解这个镜像服务的守护端口,以方便配置映射。
  • 在运行时使用随机端口映射时,也就是 docker run -P 时,会自动随机映射 EXPOSE 的端口。

格式:

1
EXPOSE <端口1> [<端口2>...]

WORKDIR

指定工作目录。用 WORKDIR 指定的工作目录,会在构建镜像的每一层中都存在。以后各层的当前目录就被改为指定的目录,如该目录不存在,WORKDIR 会帮你建立目录。

docker build 构建镜像过程中的,每一个 RUN 命令都是新建的一层。只有通过 WORKDIR 创建的目录才会一直存在。

格式:

1
WORKDIR <工作目录路径>

USER

用于指定执行后续命令的用户和用户组,这边只是切换后续命令执行的用户(用户和用户组必须提前已经存在)。

格式:

1
USER <用户名>[:<用户组>]

HEALTHCHECK

用于指定某个程序或者指令来监控 docker 容器服务的运行状态。

格式:

1
2
3
4
HEALTHCHECK [选项] CMD <命令>:设置检查容器健康状况的命令
HEALTHCHECK NONE:如果基础镜像有健康检查指令,使用这行可以屏蔽掉其健康检查指令

HEALTHCHECK [选项] CMD <命令> : 这边 CMD 后面跟随的命令使用,可以参考 CMD 的用法。

ONBUILD

用于延迟构建命令的执行。简单的说,就是 Dockerfile 里用 ONBUILD 指定的命令,在本次构建镜像的过程中不会执行(假设镜像为 test-build)。当有新的 Dockerfile 使用了之前构建的镜像 FROM test-build ,这时执行新镜像的 Dockerfile 构建时候,会执行 test-build 的 Dockerfile 里的 ONBUILD 指定的命令。

格式:

1
ONBUILD <其它指令>

LABEL

LABEL 指令用来给镜像添加一些元数据(metadata),以键值对的形式,语法格式如下:

1
LABEL <key>=<value> <key>=<value> <key>=<value> ...

比如我们可以添加镜像的作者:

1
LABEL org.opencontainers.image.authors="runoob"

Docker Compose

what is compose?

Compose 是用于定义和运行多容器 Docker 应用程序的工具。通过 Compose,您可以使用 YML 文件来配置应用程序需要的所有服务。然后,使用一个命令,就可以从 YML 文件配置中创建并启动所有服务。Compose 使用的三个步骤:

  • 使用 Dockerfile 定义应用程序的环境。
  • 使用 docker-compose.yml 定义构成应用程序的服务,这样它们可以在隔离环境中一起运行。
  • 最后,执行 docker-compose up 命令来启动并运行整个应用程序。

docker-compose.yml 的配置案例如下(配置参数参考下文):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# yaml 配置实例
version: '3'
services:
web:
build: .
ports:
- "5000:5000"
volumes:
- .:code
- logvolume01:/var/log
links:
- redis
redis:
image: redis
volumes:
logvolume01: {}

webscience

Revision

Topic 1- Social Media (Twitter) Crawl

L1 TwitterData

Exploitation of the twitter data:

With the amount of content posted to social media websites every day, such as Twitter, there is huge potential for its exploitation in many scenarios, such as:

Sports and Finance: Stock market prediction & Sports betting

predict stock market changes based on the sentiment of tweets

  • Sentiment expressed in tweets had been used to predict stock market reactions
  • Opinions of investors’ posts on social media
  • combine with the stock prize movements

Sports betting

Web Science Concepts

  • Explore the science underlying the web

    • From a socio-technical perspective
    • Mathematical properties, engineering principles, social impacts
  • Understanding users and developing Web applications for them!

    • Sociology & Web Engineering
  • Consulting corporations about social media activities

    • Economics & Web analvtics
    • For example, role of micro influencers on local economy
    • How much hate a brand page generates due to some comments, ….
  • Data analvtics

    • Growth of information; structured and unstructured
    • Intersection of networks & data
  • Broble mew technologies to scientists and engineers working together on large scale

L2 DataClustering

Content processing

1. Removing stuff

  • Non-ascii removal

remove the emoji…

image-20230429103013009

2. Grouping tweets

  • Based on content analysis like
    • Clustering, locality sensitive hashing
    • Or through content indexes
  • Once we know the groups
    • We could analyse the words, user mentions, hashtags in these groups
    • We can add these terms to a list with a priority
    • This is possibly for identifying more tweets of this type
      • Aim is data gathering
  • We can also look at proficient tweeters
    • What are their total tweets

3. Tokenization

Separate each token

Remove stopwords

4. Vector Representation

Documents are represented by a term vector

Di = (ti1,ti2 ,….,tin )

Queries are represented by a similar vector

• In binary scheme, tik is set to 1 when term k is present in Document i otherwise they are set to zero

  • The most relevant documents for a query are expected to be those represented by the vectors closest to the query, that is documents that use similar words to the query.
  • Closeness is often calculated by just looking at angles between document vector and query vector
  • We need a similarity measure!
    • Cosine similarity measure
    • Jaccard coefficient
    • Dice coefficient

Finding similar tweets Single-pass clustering

Single-pass clustering

  • requires a single, sequential pass over the set of documents it attempts to cluster.

  • The algorithm classifies the next document in the sequence according to a condition on the similarity function employed.

  • At every stage, the algorithm decides on whether a newly seen document should become a member of an already defined cluster or the centre of a new one.

    • In its most simple form, the similarity function gets defined on the basis of just some similarity (or alternatively, dissimilarity) measure between document-feature vectors.

image-20230429121645868

Comments on Single Pass Method

  • The single pass method is particularly simple
    • since it requires that the data set be processed only once.
  • Obviously, the results for this method are highly dependent on the similarity threshold that is used.
  • It tends to produce large clusters early in the clustering pass,
    • and because the clusters formed are not independent of the order in which the data set is processed.
    • You should use your judgment in setting this threshold so that you are left with a reasonable number of clusters.
  • It is sometimes used to form the groups that are used to initiate reallocation clustering.
    • If we get a large noisy clusters of tweets, we could re-cluster them!!!

L3 Credibility & Newsworthiness

Newsworthy

Key characteristics of newsworthy score

  • Real-time
    • Tweets should be scored as soon as it arrives!
  • Generalizability
    • Should be able to handle any types of events - not those just seen before
  • Adaptive
    • New information to be incorporated, as and when they arrive
    • Incorporate new information to the scoring model

These characteristics should be realized with the help of classification approach and distant supervision.

HeurisIc Labelling

  • Semi-automatic labelling approach
    • Using a set of heuristics to label
      • High quality (newsworthy) and low quality (noisy) content
  • This will not label majority of the content
  • Advantages
    • Minimal effort in creating a data set
    • Real-life data set - incremental and generalizable
    • Easily built as part of an algorithm for example event detection

Overall approach

  • Collect a set of high-quality sets and low-quality sets of data

  • Use this dataset to potentially score a newsworthy tweet

Quality Score

Quality Score = (profileWeight + verifiedWeight + followers Weight + accountAgeWeight + descriptionWeight)/5

Range is [0 to 1]

if q score is higher than 0.65 -> high quality

if q score is lower than 0.45 -> low quality

Scoring model

Likelihood ratio for each term

  • R(t) = relative importance of term in the particular quality model when compared to random background model
  • >1
    • Term is more common in the model than random
  • <1
    • Term is less common in the model than random

image-20230429134512747

cntd

if RHQ(t)<2 or RLQ(t)<2 then the Score will be set 0 as to remove the terms which have no clear association with either high quality content or low-quality content.

image-20230429134931099

L4-Geo-localisation

Fine-grained localization

Fine-grained localization refers to the task of accurately localizing objects or entities within an image or a video with high precision, usually at a sub-pixel or sub-object level. This involves identifying the precise location of the object, as well as any associated attributes such as shape, texture, and color.

The goal of fine-grained localization is to provide more detailed and accurate information about the location and properties of objects in an image, which can be useful for a range of applications such as object tracking, object recognition, and scene understanding.

Fine-grained localization can be challenging due to the variability in object appearance, pose, and scale. To overcome these challenges, various techniques such as deep learning and computer vision algorithms have been developed.

Examples of fine-grained localization tasks include localizing individual bird species in a bird-watching image, identifying specific car models in a crowded parking lot, or detecting the presence of a particular species of fish in an underwater video.

Problem statement

  • Geo-localization
    • Provide location estimates for individual tweets
  • coarse-grained Geo-localization
    • Provide location estimates for individual tweets at regional or country level
  • fine-grained Geo-localization
    • Provide location estimates for individual tweets at city or neighbourhood level
  • Approach
    • Train a model on a geo-tagged data set
      • Validate and test on geo-tagged data
      • Test on non-geo tagged data as well

Topic3 Topic modelling

Topic modelling

Discuss why searching is limited when exploring a collection

When exploring a collection, searching can be limited by various factors such as the completeness and accuracy of the metadata, the complexity of the query, and the quality of the search algorithm.

Firstly, the completeness and accuracy of the metadata associated with each item in the collection can limit the effectiveness of searching. If the metadata is incomplete or inconsistent, important information about an item may not be captured, making it difficult or impossible to find through search queries.

Secondly, the complexity of the search query can also limit the effectiveness of searching. For example, if a user is looking for items that have multiple attributes or characteristics, such as a specific color and shape, the search query may become too complex and difficult to execute accurately.

Finally, the quality of the search algorithm used to explore the collection can also limit the effectiveness of searching. If the algorithm is not designed to handle the specific characteristics of the collection or the query, it may return irrelevant or incomplete results.

To overcome these limitations, various techniques can be employed such as using natural language processing to simplify complex search queries, improving the quality and completeness of metadata through manual curation or machine learning techniques, and using advanced search algorithms that take into account the specific characteristics of the collection and query.

Topic modelling is the process of using a topic model to discover the hidden topics that are represented by a large collection of documents

  • Observed

    • collection

    • Document & words

  • Aim

    • Use the observed information to infer
      • Hidden structure
  • Topic structure - hidden

    • per document topic distributions
    • Per document per-word topic assignments
      • Annotation…
      • Can be used for retrieval, classification, browsing?
  • Utility

    • Inferred hidden structure resembles the thematic structure of the collection

latent

(of a quality or state) existing but not yet developed or manifest hidden or concealed.;

Topic modelling

  • A machine learning approach for Mining latent topics

Identify hidden, gigantic structures

Probabilistic topic models

  • a suite of algorithms that aim to discover and annotate large archives of documents of thematic information

Our goal in topic modelling

  • The goal of topic modeling is to automatically discover the topics in a collection of documents
  • Documents are observed
    • Topics, per-document and per-word topic assignments - hidden
    • Hence latent!
  • The central computation problem for topic modelling is to use the observed documents to infer hidden topic structure
  • Think it as reversing the generative process
    • What is the hidden structure that likely generated the observed collection?

LDA (Latent Dirichlet Allocation)

  • Is a statistical model of document collections that tries to capture the intuition
    Each document can be described by a distribution of topics and each topic can be described by a distribution of words

  • Topic

    • Defined as a distribution over the words/ fixed vocabulary

      • E.g., genetic topic has words about genetics (sequenced, genes) with high probability

      • Evolutionary biology has words like life, organism with high probability

Topic Modelling Approaches

  • Number of possible topic structures is exponentially large
  • Approximate the posterior distribution
  • Topic modelling algorithms form an approximation of equation,
    • by adapting an alternative distribution over latent topic structure to be close to the true posterior

Two approaches:

1. Sampling based!

Attempt to collect samples from the posterior to approximate it with an empirical distribution – Gibbs sampling!

2. Variational methods!

Deterministic alternative to sampling based methods

Posit a parametrised family of distributions over the hidden structure and then find the member of that family that is closest to the posterior

Summary

The user specifies that there are K distinct topics

​ Each of the K topics is drawn from a aDirchlet distribution

​ Uniform base distribution (u) and concentration parameter B

​ theta_k ~ Dir(Bu)

Distributions over topics of each document theta_d ~ Dir (au)

  • Topic assignment Z_d,n ~ Discrete(theta_d)
  • Wd,n~theta_Z(d,n)

Sampling based!

Attempt to collect samples from the posterior to approximate it with an empirical distribution - Gibbs sampling!

Variational methods!

Deterministic alternative to sampling based methods

Posit a parametrised family of distributions over the hidden structure and then find the member of that family that is closest to the posterior

Gibbs Sampling

It generates samples from complex, high-dimensional probability distributions.

The algorithm for Gibbs sampling is as follows:

  1. Initialize each variable with an initial value.
  2. Choose a variable to update, say the i-th variable.
  3. Sample a new value for the i-th variable from its conditional distribution, given the current values of the other variables.
  4. Repeat steps 2 and 3 for a specified number of iterations or until convergence is achieved.

LDA

  • LDA is a probabilistic generative model

    • Each document is a distribution of topics
    • Each topic is a distribution of words
  • Sample a topic from a document-level topic distribution

    • That obeys Dirichlet distribution
  • Then sample a words according to the topic distribution of this topic

    • Dirichlet distribution
  • Generate a document

  • Hence, LDA implicitly model document level word co-occurrence pattern

  • Sparsity problem exacerbates performance issues

    • The limited contexts make it more difficult for topic models to identify the senses of ambiguous words in short documents.
  • Short documents like Tweet

    • Sparse words …
    • Time-sensitive
    • Lack of clear context because not much information; no formal structure
      • How words are related often measured through co-occurrence patterns
      • In short texts document-level co-occurrence patterns are difficult to capture
  • Massive volume of tweet;

    • Memory requirements
  • Real-time nature

Problem with LDA to train twitter

Why not LDA?

  • LDA needs to be trained on the entire data sets
    • Memory requirements for the model
  • LDA is trained and tested on a data set
    • Time-sensitive nature of Twitter

How to address this issue …?

  • Enrich the word co -occurrence information
    • To enrich the limited word co -occurrence information contained in a single short text,
  • Make larger texts by grouping short texts (tweets)
    • Grouping tweets by the authors
    • However, this aggregation method highly depends on the meta - information of each text,
      • which may not always be available for many kinds of short texts.
    • Another strategy models the similarity between texts to aggregate similar texts into a long pseudo -document
  • Explicit text similarity

Topic4 Network Analysis

L6 Graph-based Network Analysis

Graph Modelling

  • Capturing structural properties of social networks
  • Relationships formed between individuals
    • To identify clusters
    • Cliques and connected components of users
    • Centrality measures
  • Hashtags
    • Which hashtags are strongly connected

Centrality Measures: Find the influential users, find the centre of a graph.

Why graph?

  • By analyzing network data, we can ask many questions

    • Who is most important in a network?
    • Which way information flows?
  • We can use graph analysis to answer questions like these

  • Note

    • Sample questions!
    • What are people talking about?
      How are they responding to a product?
    • The breadth of such analyses is huge and not covered fully

Graph Theory

Graph

  • Graphs are way to formally represent a network or a set of interconnected objects

  • Nodes and edges

  • Unlike trees no concept of root node, In the graph, there is no unique node which is known as root.

  • One node might be connected to five others!

  • No concept one-directional flow!

Edges

  • With direction or flow!
  • Without direction!

Direction

  • Origin to destination

Trees vs graphs

A tree is a set of nodes and edges. In a tree, there is a unique node which is known as root.

Terminology - Undirected graphs

  • u and v are adjacent if {u, v} is an edge,
    • e is called incident with u and v
    • u and v are called endpoints of {u, v}
  • Degree of Vertex (deg (V)):
    • the number of edges incident on a vertex.
    • A loop contributes twice to the degree

Terminology - Directed graphs

  • For the edge (u, v), u is adjacent to v OR v is adjacent from u,
  • u - Initial vertex origin)
  • v - Terminal vertex destination)
  • In-degree (deg (u)): number of edges for which u is terminal vertex
  • Out-degree (deg+ (u)): number of edges for which u is initial vertex

Incidence Matrix:

What are the maximum potential edges?

Undirected graph: (n*(n-1))/2

Directed graph (n*(n-1))

Edge density = no of edges/ max no of edges possible

Adjacency Matrix

  • There is an N x N matrix, where |V| = N,

  • the Adjacenct Matrix (N×N)

  • This makes it easier to find subgraphs

  • When there are relatively few edges in the graph the adjacency matrix is a sparse matrix

Graph analysis in twitter

Twitter is directed graph while facebook is undirected

Why graph analysis?

  • By analyzing tweet data, we can ask many questions
  • Who is most important in a network?
  • How did the information flow?
  • How could we reach 50% of the graph?
  • Who is more influential?
  • What are people talking about?
  • How are they responding to a product?

Centrality

find who is important

  • Measures of importance in social networks are called centrality measures

Degree centrality

  • Who gets the most re-tweets?

    • Basically says who is most important in the network
  • In-degree: number of retweets of a user

  • Out-degree: number of retweets this particular user made

  • The degree centrality is a fundamental metric in network analysis.

  • It is used to directly quantify the number of nodes in the network that a given node is adjacent to.

  • for directed networks

    • There are variations of this metric that are used, where connections between nodes have a directionality.
  • In directed networks, it makes sense to talk about the following:

  • In-degree - For a given node, how many edges are incoming to the node.
    the node.

  • Out-degree - For a given node, how many edges are outgoing from

  • CD(v) = deg(v)

Centrality measures!

Designed to characterise

  • Functional role - what part does this node play in system dynamics?
  • Structural importance - how important is this node to the structural characteristics of the system?

In each of the following networks, X has higher centrality than Y according to a particular measure

image-20230430204648900

  • “Who is the most important or central person in this network?”
    • There are many answers to this question, depending on what we mean by importance.
  • The power a person holds in the organization is inversely proportional to the number of keys on his keyring.
    • A janitor has keys to every office, and no power.
    • The CEO does not need a key: people always open the door for him.
  • Degree centrality of a vertex

image-20230430205230578

image-20230430210025438

Eigenvector centrality

  • Who is the most influential

  • In contrast to degree centrality

    • How important are these retweeters?
  • is a measure of the influence of a node

    • It assigns relative scores to all nodes in the network based on the concept that
    • connections to high-scoring nodes contribute more to the score of the node in
      question
      • than equal connections to low-scoring nodes.
  • Google’s page rank is variant of the eigenvector centrality.

  • Using Adjacency network

    • Ax = lambda x
    • there is a unique largest eigenvalue, which is real and positive,
    • This greatest eigenvalue results in the desired centrality measure.

Betweenness Centrality/Closeness Centrality

Betweenness centrality measures the number shortest paths in which the user is in the sequence of nodes in the path.

  • It was introduced as a measure for quantifying the control of a human on the communication between other humans in social network.
  • In this conception, vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness.

image-20230430213515337

Closeness Centrality: Definition

  • Closeness is based on the length of the average shortest path between a vertex and all vertices in the graph
  • Closeness Centrality

L7_Retweet Graph Trend & Influencers

Information diffusion ….

tracing, understanding and predicting how

a piece of information is spreading.

Information diffusion

  • in online communities, tracking the information diffusion is useful for many applications, for example,

    • such as early warning systems,
    • social bot and community detection,
    • user location prediction,
    • financial recommendations,
    • marketing campaign effectiveness,
    • political mobilization and protests
  • Twitter offers four possible actions to express interest in specific content:

    • favorite, reply, quote and retweet.
    • Replying or liking a tweet does not involve the spread of the content,
    • whereas quotes and retweets are actions used to
      • share information with a wider audience.
  • A retweet is often considered an endorsement, i.e., the user supports the original tweet’s content,

  • whereas quoting may be done in order to express a different idea

Hashtags&mentions

  • Hashtag - adding a “#” to the beginning of an unbroken word or phrase creates a hashtag.
    • When you use a hashtag in a Tweet, it becomes linked to all of the other
      Tweets that include it.
    • Including a hashtag gives your Tweet context and allows people to easily follow topics that they’re interested in.

@Mentions are used when talking to or about someone (the user account of a person, brand, group, etc.)

  • In marketing
  • Using hashtags helps a brand connect with what’s happening on Twitter.
    When brands connect with what’s happening on Twitter, they see
  • lifts across the marketing funnel, such as
    • +18% message association, +8% brand awareness, and +3% purchase intent

Twitter REST API

  • The user Tweet timeline endpoint is a REST endpoint that receives a single path parameter to indicate the desired user (by user ID).

    • The endpoint can return the 3,200 most recent
      • Tweets, Retweets, replies, and Quote Tweets posted by the user.
  • User mention timeline

    • The user mention timeline endpoint allows you to request Tweets
    • mentioning a specific Twitter user, for example,
      • if a Twitter account mentioned @TwitterDev within a Tweet
  • it is possible to collect a huge amount of information regarding tweets, accounts, users’ timelines and social networks (i.e., following and followers).

Interaction among users

  • In order to understand the connections among users, it is important to consider not only their social networks
    • but also, the way they interact, especially through retweets
  • the Twitter API does not provide complete information about retweets and their propagation paths.
    • More precisely, the only information carried by a retweet is the original user
  • To estimate retweet cascade graphs,
    • many strategies based on social network information
    • (i.e., friends and followers) in conjunction with temporal information

Retweet Graph

  • a graph of users, where an edge

    • means that one of the users has retweeted a message of a different user.
  • retweet graph G = (V, E),

    • which is a graph of users that have participated in the discussion on a specific topic.
    • A directed edge e = (u, v) indicates that user v has retweeted a tweet of u.
    • Or e = (u, v) indicates that user u has retweeted a tweet of v.

image-20230430225752042

Count the number of links to a node in the network

  • the number of directed edges with source (destination)
  • In-degree - number of retweets of a user
    Out-degree - number of retweets this particular node (user)
    made

observe the retweet graph at the time instances t = 0, 1, 2, . ..,

  • where either a new node or a new edge was added to the graph,
  • G, = (V+, Et) the retweet graph at time t

Issues in building interaction graph

  • prior studies exploited the fact that users tend to interact more often with newer tweets,
    • and a user is more likely to retweet the last friend who retweeted content.
  • However, this approach is no longer a reliable way of estimating retweet graphs,
  • Since, Twitter does not show content based on
    • simple reverse chronological order,
    • but according to user interests, trending topics and interactions
  • fetch all the required social network information.
    • the time required to fetch all
  • Due to the Twitter API rate limits, the time required to collect the list of friends and followers is
    • six times greater with respect to downloading the user’s timeline on average.

Some findings

  • analyse the spread mechanics of content through hashtag use
  • and derive probabilities that users adopt a hashtag.
  • Hash tags tend to travel to more distant parts of the network and
  • URLs travel shorter distances.

Random graph model

  • Super-star random graph node for a giant component of a retweet graph.
  • users with many retweets have a higher chance to be retweeted,
  • however, there is also a super- star node that receives a new retweet at each step with a positive probability.
  • Trending topics
    • Ongoing topics that become suddenly extremely popular
  • detecting different types of trends, for instance
    • detecting emergencies,
    • earthquakes,
    • diseases or important events in sports.
  • An important part of trending behaviour in social media is
  • the way these trends progress through the network.

Two options

  • Content of the tweets discussing a topic

    • How do we find this?
  • Underlying networks describing the social ties between users of
    Twitter

    • a graph of users, where an edge means that one of the users has retweeted a message of a different user.
  • In both cases, we could ask

    • How big or small interaction network compared to followers’ network?

    • What kind of information goes through the network?

      • 85% of content is/was News!

Largest connected component LCC

  • LCC refers to a maximum set of nodes
  • such that you can move from any node to any other node
  • in this set by only moving between side-adjacent nodes from the graph.

All components of a graph can be found by looping through its vertices,

starting a new breadth-first or depth-first search whenever the loop reaches a vertex that has not already been included in a previously found component.

Graph density

  • represents the ratio between the edges present in a graph and the maximum number of edges that the graph can contain.
  • Conceptually, it provides an idea of how dense a graph is in terms of edge connectivity.
  • In this work |E|/|V|

size of the largest connected component (LCC) and its density are the most informative characteristics for predicting the peak in Twitter.

Information diffusion & influencers

Issue with retweet graph

  • users might be exposed and influenced by a piece of information by multiple users, hence forming multiple influence paths

  • When a message arrives that is a retweet, every friend that has (re)tweeted at an earlier point in time has to be considered as a potential influencer

  • there is no agreement on the minimum number of followers needed to be regarded as an “influencer”

  • In fact, in marketing, they talk about

    • Micro-influencers

Influence paths express the relationship of “who was influenced by whom”.

The set of influence paths form a social graph, that share a common root (a single user who first seeded a tweet). Influence path is referred as “information cascade”. A cascade is formed when users forward the same original message from a user that we call the root user.

Information cascade model

  • how information is being propagated from user to user from the stream of messages and the social graph.

  • Nodes of the cascade represent users (user nodes) of the social network that got “influenced” by the root or another user.

  • Edges of the cascade represent edges of the social graph over which influence actually spread.

An “influencer” in the case of Twitter is the so called “friend” that exposes information to his/her followers and

exerts influence on them in such a way that they forward this piece of information.

However, real data is missing?

we can derive these influence paths from these social connections among users.

Absolute Interaction strength

image-20230501103443357

Retweet distribution

  • The retweet distribution given the time delay between the retweet action date and the original tweet posting time
    • for 16,304 cascades
  • Temporal dynamics of the retweets after the respective roottime
  • Showing a decreasing trend,
    • as the highest number of interactions occurred
    • soon after roottime (the original tweet creation date).

Weighted information strength

image-20230501103700956

image-20230501103748922

Approach - Generating Retweet Cascade Graphs

image-20230501103852547

No interaction

image-20230501103945037

In addition

  • when there are no available interactions by a user u, and thus
  • no IS values between u and any other user,
  • An alternative is to find a link from the u to another user in the cascade
  • collect the user’s friend list by using the Twitter API, and
  • every user’s friend that has retweeted at an earlier point in time is considered as a potential influencer
  • To identify the influencer that more likely spread the tweet to user u,
  • consider the most recent influencer, i.e., u is linked to the last friend that retweeted the message.
  • Users that still remaining without an edge after this second step are denoted as sparse nodes
    (SN).

How do you find influencer nodes and communities?

  • How could we find important nodes?
    • Influencers?
  • How can we find the information paths?
    • What measures you may use? Centrality, Degrees.
  • What alternative mechanisms to weight graphs?

L8 Network Analysis - Case studies in Health Communities

Case studies

How online communities of people with long-term conditions function & evolve: network analysis of the structure and dynamics of the asthma UK and British lung foundation online communities,

Problems

  • We have seen

    • People express themselves through social media

    • Huge amount of data

  • People suffering from mental health

    • Silent suffering!
  • Can we create a self-management or self-diagnosis tool

    • A tool helping them to control their situation

    • A tool nudging them to get help!!

  • To start with

    • How prevalent is mental health issues in society?
  • Can we mine network structure of social media to understand how communities support mental health issues?

    • Network structure of social media data provide insights on support given by society on mental health issues

Social Support

Social Support is an exchange of resources between two individuals

  • perceived by the provider or the recipient to be intended

  • to enhance the well-being of the recipient

    • e.g.,

      • Facebook interaction

      • RTs …

How to extract signatures of perceived social support?

Post-reply Network

Example -> StackOverflow

It is not like twitter, its form is due to the users shared same interest.

A community expertise network.

Graph modelling WHAT IS INTERACTION GRAPH

User interaction graph

Tie

  • Tie connect a pair of users/actors by one or more relations

    • Sharing information, financial or Psychological support

    • One relation or multiple set of relations

    • Vary in content, direction & strength

  • look at the actual tie between users instead of message level interactions

Structural prestige in online communities

  • A thread

    • How many people a user replied (out degree!!)
    • How many people replied to the user (in degree!!)
  • In directed networks

    • People who send many responses/replies

    • are considered to be prestigious

    • Or person with knowledge

Size of a node

  • the size of the node depends on the number of replies send by the user
  • The more the number of replies, the larger the node

How to create an interaction graph?

  • the users who post and the corresponding reply users

  • A Pandas DataFrame contains two columns of node names

    • posts_author, comments_author.
  • use nx.from pandas_ edgelist() to transfer the DataFrame to a network graph.

    • Network (NX) is a python software package,

    • used to create and operate complex networks,

    • and to learn the structure, dynamics and functions of complex networks.

  • To distinguish the importance of each user who post,

    • we set the size of these users’ to twice its node degree
      User’s node orange

    • Replies node to light gray the directed edges to light blue colour.

Temporal activity patterns

  • Let us study how the community thrive?
    • How do they function and evolve over time?
  • Basically we are answering research questions
    • like “ what is the basic structure of SUCH online communities and how do they function and evolve over time”

Degree distributions

  • Look at the distribution of degrees,

  • Or the amount of edges a node has

  • Acros sall nodes in a graph

  • Top 1% of nodes

    • in terms of degrees

    • most interactive and have
      Established edges with a lot of other users by exchanging messages

    • Known as super users,

Activity analysis

  • How are users engaging or community thrive?

    • Does posting activity follow a time pattern?
  • How many activity is happening on a daily or weekly basis on a particular community

    • Number of messages exchanged in a community across the whole life cycle of data

    • how users engaging with a community

  • Cumulative frequencies of activity

    • Number of posts reply per week

To understand the behaviour of the community

  • the trend tends to be linear,

    • which indicates that the number of new replies per week remains stable.
  • SuicideWatch, average 211605 posts to come weekly and PSTD average 1980 posts

  • This shows that the average weekly posting volume of Suicide Watch community users is almost 100 times that of PTSD.

    • 227,307 users in the SuicideWatch community, while PTSD has only 50032 users,

Open question – how do we distinguish two communities?

Modularity Optimization: Modularity is a measure of the degree to which nodes in a network are connected within their own community compared to the connections between different communities. Modularity optimization involves identifying communities that maximize modularity, or in other words, communities that have a high degree of internal connections and a low degree of external connections.

Girvan-Newman Algorithm: This algorithm involves iteratively removing edges from the network in order of their “betweenness centrality,” which is a measure of how often a given edge lies on the shortest path between two nodes. By removing edges in this way, the algorithm gradually breaks the network into smaller and smaller communities.

Label Propagation: This method involves assigning an initial label to each node in the network and then iteratively updating the labels based on the labels of neighboring nodes. Over time, nodes tend to cluster into groups with similar labels, forming communities.

Spectral Clustering: This method involves using the eigenvalues and eigenvectors of the network’s adjacency matrix to identify communities. By projecting the network onto a lower-dimensional space, spectral clustering can often separate nodes into distinct communities.

how did we study the behaviour of two communities & what did we found?

  • For each week

  • Compute average post

    • Look at the total posts by all users

    • Divided by total number of unique users

  • How users are engaging with the community?

    • A continuous engagement is good for the vitality of community.

    • Do these communities drive enough engagement and activity to sustain

Super users

  • A small minority of users

    • Responsible for a high proportion of posting activity and thus support

    • Functioning of communities

  • 5% of users generate

    • Over 70% of content
  • How do we study the role of super users?

    • Sensitivity analysis

How to find the super user?

  • For each user

    • Count the number of posts (A) and

    • The number of replies (B)

    • The number of total activity (A + B)

  • Rank the user in terms of respective frequencies

Connected component

  • A connected component of an undirected graph is a maximal set of nodes such that
    • each pair of nodes is connected by a path.
  • Directed graphs have weakly and strongly connected components.
  • Two vertices are in the same weakly connected
    component
    • if they are connected by a path, where paths are allowed to go either way along any edge.
  • The weakly connected components correspond closely to the concept of connected component in undirected graphs and the typical situation is similar
    • there is usually one large weakly connected component plus other small ones.

Largest connected component

  • A largest connected component of a

    • Graph G(V,E) is the largest possible subgraph

      • G_L(V_L,E_L) of G,

      • such that each node in G_L, has at least one valid connected path to every other node in G_L,

  • LCC gives us the subset of users

    • Who form a cohesive community
  • Importance of super users on LCC

    • By removing them and studying the cohesion

Community resilience

Temporal Analysis

  • Characteristics of LCC on a weekly basis

  • Focused and cohesive nature of interactions

    • By looking at the fraction of users belonging to the LCC.
  • Our aim is to study

    • community resilience
  • First how cohesive

    • The community is?
  • For each weekly graph, G

    • Compute the LCC

    • That is all nodes in LCC has at least one path

  • Compute fraction N/N

    • Nk is the nodes in LCC

Fragility of the community

  • If the conversation network is held by

    • A more or less uniform contribution of nodes

    • Or

    • Is there a skew in the responsibility of nodes

  • Sensitivity analysis methods

    • Which measures the network’s capacity to diffuse information as you move nodes based on certain property
  • Importance of super users

Sensitivity analysis

  • the targeted removal of nodes (users) starting from the most connected nodes.
  • represents the size of the largest component as a percentage of the network size.
  • Specifically, it illustrates the key effects of the superusers on the website from another perspective.

Rich club effect

  • that a few important nodes (users) show stronger and closer connection with each other,

    • and constitute a structural core and functional hub.
  • the rich-club coefficient

    • is the ratio of the actual number of edges of nodes with order greater than k
    • to the number of potential edges of each order k

image-20230501161132105

  • the coefficient continues to be lower than 1,
    • indicating that the amount of interaction between superusers are not high,
    • the amount of interactions between most non-superusers are also not high.
  • the interactions between the superusers and non-superusers are very high,
    • indicating that superusers are more inclined to communicate with users with fewer interactive connections.
  • How do we explain this
    • there are a large number of users with purposeful questions on the website and a small number of experts in the field.

Z-score

  • We have seen core users and their relationship with other users from the graph

  • We do not know whether core users

    • Tend to ask for help (post more)

    • Help others (reply more

  • Look at a thread!

  • To find that out let us look at z-score!!
    z = (х - mean)/sd

Emotion Analysis

L9

Sentiment analysis & variants

Variants

  • Sentiment classification
    • whether a piece of text is positive, negative or neutral
    • Degree of intensity
      • [-100,100]
  • Opinion analysis
    • Determining from text, the speaker’s opinion and target of the opinion
  • Stance
    • Author of text is in favour of, against of, or neutral towards a proposition or target
    • For example, Brexit agreement
      • Is people supportive?
  • Emotion
    • What are the emotion expressed in the text?

Sentiment vs Stance

  • Target:
    • Legalization of Abortion
  • Tweet
    • The pregnant are more than walking incubators. They have rights too!
    • In favour of the target
    • Target - Pro-life movement
      • ??
  • Target
    • Donald Trump
  • Tweet
    • Donald Trump has some strengths and some weakness
      • neutral

Stance detection

  • Is the task of automatically determining from text
    • whether the author of the text is in favour of, against of, or neutral
    • toward a proposition or target
  • Target
    • Person, organization, government policy, a movement, a product
    • E.g., infer from former Prime minister Boris Jonson’s speeches that he is in favour of Brexit
    • E.g., analysing tweets identify people in favour of leadership change

Aspect Based Sentiment Analysis

  • A sentence contains one or more entities,
    • each of which has a different polarity of emotion.
  • For example, give a comment like
    • “Great food but the service is dreadful!”,
    • the emotional polarity of entity “food” is “positive”
    • while the emotional polarity of entity “service” is “negative”.
  • Compared to sentence level sentiment analysis, ABSA can present
    • users with more precise and fine-grained sentiment information of entities
  • You identify an aspect and the sentiment towards that aspect

Sentiment classifification is limited

  • Language serves social and interpersonal functions.
    • Affective meaning is key for human interaction and a prominent characteristic of language use.
  • This extends beyond
    • opinions vs. factual or polarity distinctions
    • into multiple phenomena:
      • emotion, mood, personality, attitude, credibility, volition, veracity, friendliness, etc.
  • Emotion:
    • angry, sad, joyful, fearful, …
  • recognition, characterization, or generation of affect states,
    • involves analysis of affect-related conditions, experiences, and activities.

Textual emotion

  • We analyse the text and detect the emotion
    • expressed by the author or
    • the emotion potentially felt by the reader
  • Linguistic sensing of affective states can be used for
    • Understanding social issues expressed through social media
  • Researchers in psychological science believe that
    • individuals have internal mechanisms for a limited collection of responses, usually
    • happy, sad, anger, disgust, and fear

There are 6 emoMon categories that are widely used to describe

humans’ basic emoMons, based on facial expression:

anger, disgust, fear, happiness, sadness and surprise.

  • Categorical theories
  • EmoIons are discretely and difffferently constructed and
  • all humans are thought to have an innate set of basic emoIon
  • that are cross-culturally recognisable

OCC model

22 emotions = 6 Paul Ekman emotions + 16 addtional emotions

Criticism

  • In the categorical approach, emotional states are restricted to a limited number of distinct types and
  • it can be difficult to resolve a complex emotional situation or mixed emotions.
  • Appraisal theory
  • It contains componential emotion models based on the theory of appraisal.
  • Appraisal theory describes how different emotions, in various participants and on different times, can arise from the same event

Text emotion detection

motivation

  • because of the naturally vague and ambiguous human language is,
    • the emotion detection can be highly “context-sensitive and complex”
  • Emotion analysis is a convoluted task, even for human beings,
    • due to the various cultures, gender, and context of people who authored the texts.
    • The task will be much easier when emotion is expressed explicitly, but in reality,
    • the majority of texts are subtle, ambiguous, and some words have more than one meaning, and
    • more than one word expresses the same emotions, and, in addition, some emotions can exist simultaneously
  • Don’t u just HATE it when u cannot find something that u know you just saw like
    10 min ago!
  • By analysing the horse racing comments,
  • the model can learn about the information of winning horse and
  • would be able to give a reasonable prediction based on the emotion.

emotion classification

  • In the case of sentiment analysis, this task can be tackled using
    • lexicon-based methods, machine learning, or a reule-based approach
  • In emotion recognition task, the 4 most common approaches are
    • Keyword-based detection
      • Seed opinion words and find synonyms & antonyms in WordNet
      • WordNet
  • Lexical affinity
  • hybrid
  • Learning based detection

Lexicon

  • Lexicons are linguistic tools for automated analysis of text
  • Most common
    • Simple list of terms associated to a certain class of interest
    • Classification by counting
    • Terms can be weighted according to their strength of association with a given class

Lexicon based approaches?

  • “I love horror books”
  • f(love) × 1.0 + f(horror) x 0.3 + f(books) x 0.5 = 1.8 for positive
  • f(love) x 0.0 + f(horror) x 0.7 + f(books) × 0.5 = 1.2 for negative
  • Decision function
    • Classify one with maximum value
  • Transparency
    • Each prediction can be explained
      • Analysing terms that were present in the text

BERT VS gpt

Mock Paper

2022

(a) Assume that the BBC recruited you to develop a social media application. The BBC is interested in knowing their readers’ feelings on the news and other events covered by the broadcaster. Your job is to develop a classifier. In this context, answer the following questions:

(i)

Your first task is to create Twitter datasets with positive and negative statements so that they can be used for estimating the probabilities for words in the respective classes. [Hint: Assuming that you have a social media crawler, discuss how you will automatically label positive and negative tweets; how will you avoid spurious data]

use hashtag based data collection

1.Data collection: Use a social media crawler to collect tweets that mention the BBC or are in response to BBC’s tweets. This could be done using Twitter’s API to search for tweets containing specific keywords, hashtags, or mentions related to BBC’s news and events.

2.Pre-process: Clean the collected tweets by removing irrelevant information such as URLs, mentions, and special characters. Convert all text to lowercase and tokenize the words for easier analysis. Ignore this step.

3.Automatic labeling: Loop the tweets, using lexicon approach to compute the overall score of a tweet. if a tweet has been annotated with a hashtag, then use NRC hashtag lexicon approach to label the tweet, if it has a emoji at the end then use emoticons based approach to label the tweet. Then calculate the score for each feature (word/hashtag/emoticon), select the label correspondingly based on overall score.

image-20230501223242594

4.Avoid spurious data. Pre-defined relevant hashtags: Compile a list of relevant and meaningful hashtags associated with the BBC or specific news and events covered by the broadcaster. Only collect and analyze tweets containing these pre-defined hashtags to ensure high-quality data. Noise filtering: Filter out tweets that are ambiguous, contain sarcasm, or are irrelevant to the BBC’s content. This can be done using advanced NLP techniques or by setting a minimum sentiment score threshold to exclude borderline cases.

(ii)

Your second task is to develop a lexicon-based automatic sentiment analysis method, which assigns sentiment intensity between [-100,100].Describe an algorithm that also uses the dataset you created in (i). [Hint: identify a suitable lexicon; identify linguistic cases you may handle; specify a scoring method]

To develop a lexicon-based automatic sentiment analysis method that assigns sentiment intensity between [-100, 100], follow these steps:

  1. Choose a suitable lexicon: Select a pre-built sentiment lexicon that provides sentiment scores for words. Examples of such lexicons include SentiWordNet, AFINN, or VADER. These lexicons typically assign sentiment scores to words on a scale of -1 to 1 or -5 to 5.

  2. Preprocessing: Clean the dataset created in step (i) by removing irrelevant information like URLs, mentions, and special characters. Convert all text to lowercase and tokenize the words for easier analysis.

  3. Handling linguistic cases: Address various linguistic cases to improve sentiment analysis:

    a. Negations: Identify negation words (e.g., not, never, isn’t) and modify the sentiment score of the words that follow. For example, you can reverse or reduce the sentiment score of the word following the negation.

    b. Intensifiers: Identify intensifier words (e.g., very, extremely, really) and adjust the sentiment score of the following word accordingly. For example, you can multiply the sentiment score by a factor (e.g., 1.5 or 2) based on the intensity of the intensifier.

    c. Diminishers: Identify diminisher words (e.g., slightly, barely, hardly) and adjust the sentiment score of the following word accordingly. For example, you can multiply the sentiment score by a factor (e.g., 0.5) based on the diminishing effect.

  4. Scoring method: Implement a scoring method to calculate the sentiment intensity of each tweet using the lexicon and linguistic cases:

    a. Initialize a sentiment score variable to 0 for each tweet.

    b. Iterate through the words in the tweet, and for each word, check if it has a sentiment score in the lexicon.

    c. If the word has a sentiment score, adjust the score based on the linguistic cases (negations, intensifiers, diminishers) if applicable.

    d. Add the sentiment score of the word to the tweet’s sentiment score.

    e. After processing all words, normalize the tweet’s sentiment score to fit the range of [-100, 100]. For example, if the lexicon’s sentiment scores range from -5 to 5, multiply the tweet’s sentiment score by 20.

    f. Assign the normalized sentiment score to the tweet as its sentiment intensity.

  5. Evaluation: Compare the sentiment intensity assigned by the lexicon-based method with the labels in the dataset created in step (i). Calculate performance metrics such as accuracy, precision, recall, and F1-score to evaluate the effectiveness of the lexicon-based sentiment analysis method.

By following these steps, you can develop a lexicon-based automatic sentiment analysis method that assigns sentiment intensity between [-100, 100] and uses the dataset created in the previous task.

(iii)

Now that you created a sentiment analysis method, you want to verify the method’s validity from a user’s perspective. Design a scalable user-based study to ensure your sentiment scoring method is appropriate.

To design a scalable user-based study to ensure the sentiment scoring method’s appropriateness, follow these steps:

  1. Select a representative sample: Randomly sample a subset of tweets from the dataset created in the first task. Make sure the sample includes a balanced number of positive and negative tweets, as well as a diverse range of topics covered by the BBC.
  2. Prepare the evaluation interface: Develop a user-friendly interface for the study participants. The interface should display a tweet and its corresponding sentiment score, calculated using the lexicon-based sentiment analysis method. Participants should be able to rate the sentiment score’s appropriateness on a scale (e.g., 1 to 5, with 1 being “strongly disagree” and 5 being “strongly agree”).
  3. Recruit participants: Recruit a diverse group of participants to ensure a broad range of perspectives. You can use platforms like Amazon Mechanical Turk, Prolific, or other crowdsourcing services to recruit participants on a large scale.
  4. Training and instructions: Provide clear instructions and examples to participants on how to evaluate the sentiment scores. Briefly explain the concept of sentiment analysis, the scoring range of [-100, 100], and the evaluation scale. You may also provide examples of tweets with appropriate and inappropriate sentiment scores to help participants understand the task better.
  5. Evaluation process: Ask participants to evaluate the sentiment scores of the sampled tweets using the provided interface. Encourage them to consider the context of the tweet and the sentiment score’s appropriateness based on the tweet’s content.
  6. Collect user feedback: Allow participants to provide qualitative feedback on the sentiment scoring method, highlighting any issues or suggestions for improvement. This feedback can help identify potential areas of refinement for the sentiment analysis method.
  7. Analyze results: After collecting the evaluations, calculate the average appropriateness score for each tweet’s sentiment score. High average scores indicate that the sentiment scoring method is appropriate from the user’s perspective. Analyze the qualitative feedback to identify common themes and potential areas for improvement.
  8. Iterate and improve: Based on the study results, refine the sentiment analysis method to address identified issues and incorporate user feedback. Repeat the user-based study with the updated method to evaluate its effectiveness iteratively.

By designing and conducting a scalable user-based study, you can ensure that the sentiment scoring method is appropriate from the user’s perspective and make data-driven improvements to enhance its accuracy and effectiveness.

(a)Create a vector representation for the following short text. Identify and remove potential stop words. “@AlanStainer @takeitev It’s mad isn’t it. In the UK there are 8k petrol stations with multiple pumps and 25k chargers (increasing by 300 pm). They do know the climate emergency is now right? Not in 30 years’ time, Just asking”

Stopwords: “the”, “in”, “and”

[“it’s”, “mad”, “isn’t”, “uk”, “8k”, “petrol”, “stations”, “multiple”, “pumps”, “25k”, “chargers”, “increasing”, “300”, “pm”, “know”, “climate”, “emergency”, “right”, “30”, “years’”, “time”, “asking”]

(b)Create all biterms from the following text, “in the UK there are 8k petrol stations with multiple pumps and 25k chargers

(“in”, “the”) (“in”, “UK”) (“in”, “there”) … (“pumps”, “and”) (“pumps”, “25k”) (“pumps”, “chargers”) … (“and”, “25k”) (“and”, “chargers”) … (“25k”, “chargers”)

(c) Assume you have developed a topic model on a collection with your university communications for the last academic year. Design a user-centred experiment to evaluate the interpretability of the model. [Hint: design tasks and justify, selection of subjects, number of users, what will you measure, how do you prove the results]

Designing a user-centered experiment to evaluate the interpretability of a topic model involves several components, including selecting subjects, designing tasks, determining the number of users, and measuring performance. Here’s an outline for such an experiment:

  1. Define the goal: The primary goal of the experiment is to evaluate the interpretability of the topic model in terms of the coherence and relevance of the identified topics within the context of university communications.

  2. Selection of subjects: Choose participants who have some familiarity with the university environment, such as students, faculty, and administrative staff. This will ensure that they can understand and assess the relevance of the topics generated by the model.

  3. Number of users: To ensure the results are reliable and generalizable, aim for a diverse sample of participants. A sample size of 30-50 participants is typically considered sufficient for user-centered experiments, but the optimal number may vary depending on factors such as the complexity of the tasks and the desired statistical power.

  4. Design tasks: Create tasks that help assess the interpretability of the topics generated by the model. Example tasks include:

    a. Topic labeling: Ask participants to assign meaningful labels to a given set of topics. This assesses whether users can understand and make sense of the topics generated by the model.

    b. Topic ranking: Ask participants to rank a list of topics based on their relevance or importance to the university’s communications. This helps evaluate the model’s ability to identify meaningful and relevant topics.

    c. Document-topic assignment: Provide participants with a set of documents and ask them to assign the most relevant topic from the model to each document. This task assesses whether users can effectively map the generated topics to real-world documents.

  5. Measurement: Collect quantitative and qualitative data to evaluate the interpretability of the model.

    a. Quantitative measures: Calculate agreement scores (e.g., Fleiss’ Kappa or Cohen’s Kappa) to measure the consistency among participants in terms of topic labeling, ranking, and document-topic assignments.

    b. Qualitative measures: Collect subjective feedback from participants about the coherence, relevance, and overall interpretability of the topics. This can be done through open-ended questions, interviews, or questionnaires.

  6. Analyze results: Analyze the quantitative and qualitative data to assess the interpretability of the model. High agreement scores and positive feedback from participants would indicate that the model generates interpretable topics.

  7. Prove the results: To prove the results, compare the performance of the topic model with alternative models or baseline approaches (e.g., LDA, NMF). Conduct a similar user-centered experiment for the alternative models and compare the agreement scores and subjective feedback. A higher performance of the developed topic model compared to the alternatives would provide evidence for its interpretability.

By carefully designing and executing a user-centered experiment, you can evaluate the interpretability of a topic model in the context of university communications, ensuring that the model generates meaningful and relevant topics for the users.

(d)You have collected tweets and newspaper articles from Scotland for the last month.

Describe a method to develop topic models from these datasets. [Hint identify issues in dealing with such heterogenous datasets; how would you handle such issues]

l Prepare the tweets data, Pre-process of the data.

l Create a bigram model and trigram model. Then use the data with bigrams and trigrams to create the dictionary which will be used in model training.

l Use dictionary to create the corpus.

l Use LDA model to run topic modelling by the previous data which has been processed.

l Use different measures which including KL divergence, perplexity, and coherence to evaluate the model. This step will also help us to determine the number of topics.

l Update the model parameter by the result of evaluation.

l Visualize the model. The topic keywords will be displayed by a chart or a word cloud.

Handling heterogeneity: Tweets and newspaper articles have different lengths, styles, and contexts. To address these issues:

a. Length normalization: To mitigate the differences in length between tweets and newspaper articles, consider using techniques like document sampling or text chunking. For instance, divide newspaper articles into smaller chunks of approximately the same size as tweets.

b. Text representation: Use a suitable text representation method that can capture the context and semantics of both short and long texts. Techniques like TF-IDF, word embeddings (e.g., Word2Vec, GloVe), or even more advanced methods like BERT embeddings can be used.

c. Combining datasets: Combine the preprocessed tweets and newspaper article chunks into a single dataset to build a unified topic model.

3.Applying clustering (e.g., Single-pass clustering) on Twitter data stream will create groups of similar tweets of varying sizes. Design an algorithm to detect events from such groups. Specifically, answer the following questions.

(i)What role do entities play in detecting events? How would you reduce the cost of detecting entities?

Entities can include named entities like people, organizations, locations, or other relevant terms specific to an event. By identifying and tracking entities within the tweet clusters, we can recognize emerging events and monitor their development. Entities can help:

  1. Identify the key components of an event: Entities can represent the primary subjects or objects related to an event, enabling us to understand the event’s main focus.
  2. Differentiate between events: Entities can help distinguish between different events by providing context-specific information, which allows us to separate events with similar keywords but different contexts.
  3. Track the progression of events: Monitoring the frequency and co-occurrence of entities over time can provide insights into how an event is evolving and help identify new developments or trends.

Reducing the cost of detecting entities:

Detecting entities in real-time can be computationally expensive, especially for large-scale Twitter data streams. To reduce the cost of detecting entities, consider the following strategies:

  1. Entity extraction optimization: Use efficient named entity recognition (NER) tools or libraries that can handle streaming data, like spaCy or the Stanford NER. These libraries are optimized for performance and can handle large-scale text data efficiently.
  2. Filter irrelevant data: Preprocess the Twitter data stream to remove irrelevant information, such as URLs, hashtags, user mentions, and stop words. This reduces the volume of data to process and allows the entity extraction to focus on relevant content.
  3. Keyword-based entity detection: Instead of using full-fledged NER models, you can create a list of relevant keywords or entities specific to the domain of interest. This can help in detecting events of interest with lower computational cost.
  4. Incremental entity extraction: Instead of processing the entire data stream at once, perform entity extraction incrementally as new tweets arrive. This can help distribute the computational load over time, making it more manageable.
  5. Parallelization: Utilize parallel processing techniques to distribute the entity extraction task across multiple cores or machines, which can significantly speed up the process and reduce the overall computational cost.

By incorporating these strategies, you can reduce the cost of detecting entities in Twitter data streams while still effectively identifying and tracking events.

(ii)If you were to use tf-idf concepts for representation, how would you capture them?

To capture TF-IDF concepts for representation in the context of detecting events from Twitter data, follow these steps:

  1. Data preprocessing: Preprocess the tweets by removing irrelevant information (e.g., URLs, hashtags, user mentions), tokenizing the text, removing stop words, converting all text to lowercase, and performing lemmatization or stemming.

  2. Term Frequency (TF): Calculate the term frequency for each term in each tweet. The term frequency is the number of times a term appears in a tweet divided by the total number of terms in that tweet. This step normalizes the frequency of terms in each tweet, accounting for varying tweet lengths.

  3. Inverse Document Frequency (IDF): Calculate the inverse document frequency for each term across the entire Twitter data stream. The IDF measures the importance of a term by considering its rarity in the entire dataset. To calculate IDF, first compute the document frequency (DF) – the number of tweets containing a particular term. Then, compute the IDF as the logarithm of the total number of tweets divided by the DF.

  4. TF-IDF representation: Compute the TF-IDF score for each term in each tweet by multiplying the TF and IDF values. This score represents the importance of a term in a tweet while considering its rarity in the entire dataset.

  5. Feature vectors: For each tweet, create a feature vector with the TF-IDF scores of its terms. This can be represented as a sparse vector where each dimension corresponds to a unique term from the entire vocabulary across all tweets, and its value is the TF-IDF score for that term in the specific tweet. If a term is not present in a tweet, its value in the vector will be zero.

  6. 计算TF(Term Frequency,词频):对于给定的文档d和词语t,它的TF值可以通过如下公式计算:

    TF(t, d) = (词语t在文档d中出现的次数) / (文档d中词语总数)

    其中,分子表示词语t在文档d中出现的次数,分母表示文档d中所有词语的总数。TF值反映了词语在当前文档中的重要程度,即出现次数越多,TF值越大。

  7. 计算IDF(Inverse Document Frequency,逆文档频率):对于语料库中的所有文档和词语t,它的IDF值可以通过如下公式计算:

    IDF(t) = log((语料库中文档的总数) / (包含词语t的文档数+1))

    其中,分子表示语料库中所有文档的总数,分母表示包含词语t的文档数(加1是为了避免分母为0)。IDF值反映了词语在整个语料库中的重要程度,即出现文档数越少,IDF值越大。

  8. 计算TF-IDF:将TF和IDF相乘,得到词语t在文档d中的TF-IDF值。

    TF-IDF(t, d) = TF(t, d) * IDF(t)

    TF-IDF值反映了词语t在文档d中的重要程度,同时也考虑了词语在整个语料库中的出现情况。在计算TF-IDF时,还可以对TF值进行平滑处理,例如使用下面的公式:

    TF(t, d) = 0.5 + 0.5 * (词语t在文档d中出现的次数) / (文档d中词语总数)

    这种平滑处理可以避免长文档中某个词语的TF值过高的问题。

(iii) How do you remove noisy or spam groups?

Removing noisy or spam groups from a dataset involves filtering out irrelevant or low-quality content. Here are some strategies to identify and remove such groups:

  1. Text-based filtering: Analyze the content of the groups and apply filters to eliminate groups containing certain keywords or patterns that are commonly associated with spam or noise. For example, you can create a list of common spam keywords, phrases, or patterns, and remove groups that contain a high frequency of these terms.
  2. Frequency-based filtering: Analyze the posting frequency of the groups. Spam or noisy groups often exhibit unusual posting patterns, such as posting the same content repeatedly, or posting at extremely high frequencies. Set a threshold for acceptable posting frequency and filter out groups that exceed this limit.
  3. User-based filtering: Analyze the users contributing to the groups. If a group consists mostly of users with suspicious behavior or characteristics (e.g., newly created accounts, accounts with very few followers or following a large number of users), it might be a spam or noisy group. You can create a scoring system to rate the credibility of users and filter out groups with a high proportion of low-credibility users.
  4. Group size: Small groups or groups with very few members might be more likely to be noisy or spammy. You can set a minimum group size threshold and remove groups that fall below this limit.
  5. Language-based filtering: Analyze the language used in the groups. Spam or noisy groups may contain a high proportion of irrelevant r nonsensical text, or text in a language that is not of interest for your analysis. Use natural language processing techniques, such as language detection, to filter out groups with content in unwanted languages or with a high proportion of unintelligible text.
  6. Machine learning techniques: Train a machine learning model to classify groups as spam, noisy, or relevant based on features like text content, posting frequency, user characteristics, group size, and language. This approach can be more adaptive and effective in identifying spam and noisy groups, especially if the model is regularly updated with new data.

By applying these strategies, you can identify and remove noisy or spam groups from your dataset, allowing for more accurate and meaningful analysis of the remaining content.

(iv) How would you identify categories of events?

1.Visualization, use diagram to demonstrate the top 20 most frequent words in a group, this can help us understand the topic of a group and help us to categorize the groups. Or use the PCA to reduce the dimensionality of the data and project it to a lower space, find the overlaps.

2.Cluster labeling, Assign descriptive labels to the groups based on the representative information extracted in the previous step. You can manually analyze the most frequent terms, key phrases, or named entities in each group and assign a suitable category label. Alternatively, you can use an automated approach like extracting the most frequent terms or key phrases as labels.

(v) Provide an algorithm for combining similar groups of tweets (e.g., tweets containing same entities).

  1. Extract entities from each group:
    • For each group, extract the named entities (e.g., people, organizations, locations) from the text of the tweets and articles.
    • You can use a Named Entity Recognition (NER) library like spaCy or Stanford NER to perform this task.
  2. Calculate entity similarity between groups:
    • Create a function to compute the similarity between two groups based on the shared entities.
    • You can use the Jaccard similarity coefficient, which is the size of the intersection of entities divided by the size of the union of entities for each pair of groups.
  3. Combine similar groups based on a similarity threshold:
    • Set a similarity threshold (e.g., 0.5) to decide whether two groups should be combined.
    • For each pair of groups:
      • Compute the entity similarity using the function created in step 2.
      • If the similarity score is greater than or equal to the threshold:
        • Combine the two groups into one.
        • Update the entity set of the combined group.
        • Remove the original groups from the list of groups.
    • Repeat this process until no more groups can be combined based on the similarity threshold.

here are some other features than entities you can use to combine similar groups:

  1. Term Frequency (TF):
    • Use the frequency of terms within each group as a feature. Calculate the similarity between groups based on the overlap of their most frequent terms.
  2. Key phrases:
    • Extract key phrases from the text in each group using techniques like RAKE (Rapid Automatic Keyword Extraction) or TextRank. Compare the groups based on the overlap of their key phrases.
  3. Sentiment analysis:
    • Calculate the average sentiment score of each group using a sentiment analysis library or pre-trained model. Combine groups with similar sentiment scores.
  4. Topic modeling:
    • Apply topic modeling techniques like Latent Dirichlet Allocation (LDA) to the dataset. Compare groups based on the distribution of topics within each group.
  5. Word embeddings:
    • Use pre-trained word embeddings like Word2Vec, GloVe, or BERT to represent the text within each group. Calculate the average embedding for each group and compare groups using cosine similarity or other distance metrics.
  6. N-grams:
    • Extract N-grams (sequences of N consecutive words) from the text within each group. Compare groups based on the overlap of their most frequent N-grams.
  7. Text similarity:
    • Calculate the average pairwise text similarity within each group using a text similarity measure like cosine similarity, Jaccard similarity, or edit distance. Combine groups with similar average text similarity scores.
  8. Hashtags and user mentions:
    • For Twitter data, extract hashtags and user mentions from the tweets within each group. Compare groups based on the overlap of their most frequent hashtags and user mentions.

(vi) how do we find the brusting clusters?

To find bursting clusters, you can use a sliding window algorithm combined with a clustering method. The idea is to divide the data into time windows and apply clustering within each window to identify groups of similar items. By comparing the clusters across different time windows, you can detect bursts of activity. Here’s an outline of the algorithm:

  1. Divide the data into time windows:
    • Choose an appropriate window size and step size based on the dataset and the expected duration of bursts.
    • Divide the data into non-overlapping or overlapping time windows accordingly.
  2. Apply clustering within each time window:
    • For each time window, preprocess the data (e.g., tokenize, remove stop words, stemming/lemmatization) and create feature vectors using methods like TF-IDF or word embeddings.
    • Apply a clustering algorithm (e.g., K-means, DBSCAN) to the feature vectors within the window to group similar items.
  3. Detect bursts by comparing clusters across adjacent time windows:
    • Define a burst detection criterion, such as a significant increase in the number of items within a cluster or the emergence of a new cluster with a large number of items.
    • For each pair of adjacent time windows, compare the clusters and identify those that meet the burst detection criterion.

Here’s a pseudo-code for the algorithm:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
phpCopy code
function preprocess_and_vectorize(data):
preprocessed_data = preprocess_text(data)
feature_vectors = create_feature_vectors(preprocessed_data)
return feature_vectors

function cluster_data(feature_vectors):
clusters = apply_clustering_algorithm(feature_vectors)
return clusters

function detect_bursts(window_clusters, burst_threshold):
bursts = []
for i in range(len(window_clusters) - 1):
for cluster in window_clusters[i]:
next_window_cluster = find_corresponding_cluster(cluster, window_clusters[i + 1])
if not next_window_cluster:
continue
growth = len(next_window_cluster) - len(cluster)
if growth >= burst_threshold:
bursts.append((cluster, next_window_cluster))
return bursts

data = load_data()
window_size = ...
step_size = ...
burst_threshold = ...

time_windows = create_time_windows(data, window_size, step_size)
window_clusters = []

for window in time_windows:
feature_vectors = preprocess_and_vectorize(window)
clusters = cluster_data(feature_vectors)
window_clusters.append(clusters)

bursts = detect_bursts(window_clusters, burst_threshold)

This algorithm divides the data into time windows, applies clustering within each window, and detects bursts based on changes in cluster sizes across adjacent windows. You can customize the window size, step size, clustering algorithm, and burst detection criterion based on your specific dataset and use case.

(b) Describe a methodology to predict stock trend prediction from social media data

StockNet (Deep Learning method)

StockNet is a hypothetical term that could refer to a network or system that aims to predict stock trends. While I’m not aware of a specific tool called “StockNet,” there are many approaches to predicting stock trends using machine learning and artificial intelligence techniques. One such approach is using deep learning models like recurrent neural networks (RNNs), long short-term memory (LSTM) networks, or convolutional neural networks (CNNs).

To use a deep learning model like “StockNet” to predict stock trends, follow these general steps:

  1. Data collection: Gather historical stock data, such as stock prices, trading volumes, and other relevant financial indicators. You may obtain this data from financial data providers, public financial statements, or web scraping.
  2. Data preprocessing: Clean and preprocess the data to eliminate noise, handle missing values, and convert categorical data into numerical formats. This step may also involve feature engineering to create new features that may be relevant for prediction.
  3. Feature scaling: Scale or normalize the features to ensure that they have similar ranges and are suitable for input into a deep learning model.
  4. Train-test split: Split the dataset into training and testing sets. The training set will be used to train the model, while the testing set will be used to evaluate its performance.
  5. Model selection: Choose an appropriate deep learning model based on your data and problem. Common choices include RNNs, LSTMs, and CNNs.
  6. Model training: Train the selected model on the training data. This involves adjusting the model’s parameters to minimize the prediction error. You may need to experiment with different hyperparameters, such as learning rate, batch size, and the number of hidden layers.
  7. Model evaluation: Evaluate the model’s performance on the testing data. Common evaluation metrics include mean squared error (MSE), mean absolute error (MAE), and R-squared.
  8. Fine-tuning: If the model’s performance is unsatisfactory, fine-tune it by adjusting the hyperparameters, modifying the model architecture, or changing the features used.
  9. Prediction: Once the model has been fine-tuned and performs well on the testing data, use it to predict future stock trends based on the input features.

Keep in mind that predicting stock trends is inherently difficult due to the complex and dynamic nature of financial markets. No model can guarantee accurate predictions, and it is essential to manage risks and avoid relying solely on model predictions for making investment decisions.

  • Copyrights © 2021-2024 Mingwei Li
  • Visitors: | Views:

Buy me a bottle of beer please~

支付宝
微信