Set up and Usage notes for Proxmox (KVM/Qemu)

Bump 1: Dick move from Proxmox

The first thing that trips me from Proxmox is the downloadable, despite it said it’s free if you don’t use their enterprise repository, is Enterprise (paid) version out of the box, with no option to download the free version that’s configured as free/community edition.

It’s a dick move to greet ALL new users with this, hoping to scare them to consider a subscription:

I don’t think frustrating people who are trying to learn/explore the software will make them want to pay for a subscription. The best this dick move can do is to scare new users away as the user might think they did something wrong getting things they don’t expect. I certainly thought of throwing out Proxmox had there be better options out there when I run into this, as I’m still evaluating whether I should go with Qemu or Hyper-V.

First of all. This scary message doesn’t actually block you from using Proxmox. It’s just that you don’t get updates until you either pay for their enterprise repositories or change to the free repositories. At least you can still use the interface to gain shell access which we’ll need to fix it (or you can go to the physical computer and enter the same thing in the text terminal display locally)

The difference between enterprise and free is just which servers the update repositories points to. Getting the latest and greatest is not necessarily a plus for enterprise users so they let free users take the risks first and provide feedback so they can polish their software. Fair enough. Great model.


There are two parts to fixing this ordeal:

  • Configuring the update repositories to the free no-subscription repositories (Functional issue, and it’s per node, including slave nodes)
  • Removing the nag screen (Cosmetic, and it’s the overall Proxmox, aka the main node hosting the Proxmox management interface)

Fix Subscription Scare Part 1: Updating repositories locations

Basically, what you’ll need to do is to notice edit the lines file names show below (underlined in red) corresponding to the repository URL path you want to change:

/etc/apt/sources.list.d/pve-enterprise.list is not necessary for free users, so you can simply comment all the lines out (there’s only one line)

/etc/apt/soruces.list is the link to the core repository for Proxmox updates. Instead of blindly following exact instructions which can go stale as version progresses, open the URL http://download.proxmox.com/debian/ and see what’s out there. What’s not spelled out in the web admin interface is the intermediate folder called ‘dist

bookworm is the latest Debian version’s code name at the time of writing

and obviously we pick the branch/sub-folder that says no-subscription (there’s no enterprise here since it belongs to a different root URL), but you still have to get the name right for the ‘Components’

You can open it with a text editor like nano:

nano /etc/apt/sources.list

and edit this proxmox’s repository line (remember to skip the ‘dists‘ intermediate folder). Every space after a word that is just a subfolder you see from the folder structure. If Debian released a new version/codename, you might also want to update the first 3 lines of debian repositories as well to match the name code name (and folder structure if they rearranged it).

Ceph is an optional feature (not installed) yet it’s configured to be enterprise as well, so for consistency, we might want to change it to the no-subscription (free) version as well. The latest codename for ceph that was published at the time of writing is “quincy” (there’s nothing in the “reef” folder), so we click on it.

Again the ‘dists’ is boilerplate and not spelled out (so we don’t enter it) in the entries of the repository sources file.

bookworm is the current Debian version for that

and we see a “no-subscription” folder which is the one we want obviously. We can just guess by sensible names you’d choose if you were the developer.

You can again open a text editor like nano to edit the repository location file as shown in the web admin UI

nano /etc/apt/source.list.d/ceph.list

And finally, disable /etc/apt/source.list.d/pve-enterprise.list

Under the hood, it’s basically Proxmox adding a # (comment sign) to disable the line in /etc/apt/source.list.d/pve-enterprise.list with the similar procedures we did:

Hit Reload and you are done with this subscription scare tactic.

Actually out of consistency, you can build your own pve-no-subscription.list to repace pve-enterprise.list and replace ‘enterprise‘ in the root URL with ‘download‘, update the Debian codename (at the time of writing it’s ‘bookworm‘), and the change the components folder from ‘pve-enterprise‘ to ‘pve-no-subscription‘, which translates to crawling this repository path: http://download.proxmox.com/debian/pve/dists/bookworm/pve-no-subscription/

There’s nothing fancy and hard-coded about these names. It’s basically the URL of where the update files are stored with an intermediate folder ‘dists‘ sandwiched between the root URL and the tokens (separated by spaces) which are basically subfolder names. All it does is to attach a ‘/dists/‘ after the root URL and replace the rest of the ‘/‘ with spaces.

It simply looked like the developers for the web admin UI didn’t have the time to get to making the table entries clickable and editable yet and they merely got to make the enable/disable button to comment out the lines in the file. You’ll see similar UI deficiencies in a lot of places later which you’d have to go to the shell to do it yourself after researching the concepts.

Fix Subscription Scare Part 2: Removing the nag screen

Even after you fixed the repository locations, it’s only per node, yet the nag screen is at the top admin UI level. The Updates/Repositories interface won’t show error messages (undownloadable repositories) anymore, but the nag screen still needs to be addressed.

Luckily somebody wrote a script to deal with it, hosted on Github:

# Download script
wget https://raw.githubusercontent.com/foundObjects/pve-nag-buster/master/install.sh
# Good practice to read ANY unknown script to make sure there's no shenanigans 1st

# Then run the script with sudo
sudo bash install.sh

This blog shows the mechanisms in case if some changes broke the script above.

TAP Networking not in the Web UI

TAP interface is necessary for the ethernet card on the VM directly interact with the router connected to the physical hardware (PHY/NIC) but with a different identity, which puts it at equal level as other physical computers on your network. This is often useful when you want to host servers.

I’ve adapted the instructions from Extremecoders on Github here as the default ethernet device names are different

  • Debian doesn’t use the /eth0 naming scheme anymore. It’s /enp4s0
  • /br0 now has a prefix “vm” in front of it since it’s a virtual bridge. Proxmox created this by default. The ‘bridge’ in this case is not in the ethernet bridge we understand in Windows (which bridges two interface together as one), but instead it’s just a virtual ethernet switch. Once I know this twist, I understand how to set TAP up

Since we are using the /vmbr0 which is already set up, we can skip the bridge creation and adding the physical network card /enp4s0 to the /vmbr0 ‘bridge’ (virtual switch).

The core step is to create a TAP interface. Let’s call it /tap0

tunctl -t tap0

You don’t need the “-u (username)” part unless you want to assign ownership of this specific TAP interface to a specific user.

Then you need to add this TAP (/tap0 in this example) to the ‘bridge’ (/vmbr0 in this example). ‘addif’ means ‘add interface’. ‘brctl’ means ‘bridge control

brctl addif vmbr0 tap0

Make sure all the physical card (/enp4s0), the TAP interface (/tap0) and the ‘bridge’ (/vmbr0) is up. Then assign IP to the ‘bridge’ /vmbr0. If using acquiring new IP address from DHCP, use DHCP client (dhclient):

dhclient -v vmbr0

Pools for resources

You can directly create the virtual hard disks directly from where you are configuring your VM, but you can only delete it from a Pool viewer. This is the same as VirtualBox. ‘local-lvm’ is a bunch of virtual hard drive images that you need to mount to act like a hard disk drive. Your VM images lives in /dev/pve

It’s a little more rigid than VirtualBox where you can directly point to the CD image. In Proxmox you have to upload the CD image to a pool before referring them in the VM’s settings. ‘local’ is just a folder of files (specifically /var/lib/vz), and the CD images goes to /var/lib/vz/template/iso.

Actually local and local-lvm are all defined at the root level called ‘Datacenter’

Default CPU

By default Proxmox choose x86-64-v2-AES for you, which might have better compatibility. I had trouble with the Windows port of Qemu not supporting hosts because my CPU is too new for it, but the Linux Qemu-Kvm have no trouble recognizing my new CPU under ‘hosts’ type. Look into the extra CPU flags to match whether you are using an AMD or Intel CPU.

Loading

KVM-Qemu Duality

Qemu started off as a software emulator at user space (Type 2 hypervisor like VirtualBox) when there weren’t hardware features specifically supporting VM use cases.

KVM extended Qemu to take advantage of hardware features that accelerates virtualization which requires kernel space (OS low level direct) access, more directly handing off requests to hardware instead of going exclusively through the emulated fabric.

Conceptually I think of KVM as the kernel space (low-level hardware drivers / hardware accelerator) portion of the software while Qemu is the user space (interactive) portion of the software.

What made it confusing is that KVM was a fork of Qemu, so they were seen as one package when people said ‘KVM’ (when they should have said qemu-kvm). The responsibilities are not that clearly separated as far as Linux is concerned but when we get to MacOS/Windows, we start to appreciate the distinction that KVM is the kernel-based hardware accelerator itself while Qemu is the whole idea of virtualization.

This is confirmed by Qemu’s docs:

KVM made no sense in Windows, as Microsoft is obviously going to implement KVM in its kernel. Straight ports of Qemu to Windows retained the KVM lingo, which confused the hell out of me.

Since KVM does not exist on Windows. Intel use to make HAXM for its processor on Windows, but they discontinued it. Microsoft’s version of real hardware accelerator is WPHX. These are the only 2 available hardware acceleration options for Qemu on Windows.

TCG (Tiny Code Generator) is not real hardware acceleration, but more like an on-the-fly machine code (think of it as assembly for those who don’t know the difference) instruction translator that matches the VM guests’ instruction set with the host hardware’s instruction set.

People usually call ‘KVM’ (qemu-kvm) a Type 1 hypervisor but the line is blurry. Does it still count as Type 1 if you add an FTP server on the linux distro that hosts the KVM? What makes KVM fast was its hardware acceleration through the kernel, but the user-space qemu calls can make these kernel calls by using KVM as a hardware accelerator.

In my opinion, whether it’s Type 1 or Type 2 mostly boils down to intent on whether you are dedicating the computer to serving VMs. Most people nowadays won’t insist on foregoing available hardware acceleration (which is kernel space) to solely run the VM as a purely software emulator. The VM idea is to decouple the hardware from the software, who cares if you let the host computer take a side job (like running a FTP server) as long as virtualiation overhead is the same?

Loading

Modern Motherboards with PCI support

I work with a lot of expensive test equipment that relies on PCI bus, such as Acqiris digitizers and Agilent logic analyzers. They can work in computers as modern as Windows 10 and LGA1700 processors if you play your cards right (pun intended). There’s no reason to pay 10 times more to buy a PCIe card when a PCI card will do the job.

Contact me at Humgar LLC and I’ll help you figure it out if you plan to buy test instruments from me that requires PCI support, or pay for short consulting services to have me figure out the rough edges for you.

LGA1700 (12th~13th Generation Intel CPU)

  • Asus H610M-CT D4-CSM (1 PCI slot, microATX)
  • AAEON ATX-Q670A (3 PCI slots, Full ATX)
  • Advantech AIMB-788 (2 PCI slots, Full ATX)
  • Advantech AIMB 708 (4 PCI slots, Full ATX)

I also have other solutions such as using a short PCI bridge interface to drag your PCIe card out to take advantage of the unused extra space in the chassis, but these are subjected to testing and verification because every situation is different.

Nonetheless, hardware for accommodating PCI are chump change compared to the difference in costs buying a new card if you are talking about high end Ghz or 12~16-bit cards. Tell me your scenario and I’ll suggest whether you are better off buying a new PCIe card or adapting your computer/chassis to use PCI cards which are much bigger bang for the buck because of people’s irrational fear on compatibility.

Loading

Programming Techniques: Bit hackery

Sean Eron Anderson (Stanford CS graphics lab)’s bit twiddling pages often shows a bunch of neat bit tricks, but it’s more like a recipe book than a unified way to summarize the common concepts behind them. Here’s my attempt. This page will get updated as I got the time and more useful insights collected.

Concept: Two ways to get two’s complement

This is the basics of most bit hacks below. Sometimes the definition itself is a bit trick on its own.

\begin{align}
\overline{-x}+1=x \\
-x = \overline{x}+1 
\end{align}

the above reads: to flip sign, flip bits then add one.

\begin{align}
\overline{x} = -x-1 \\
\end{align}

the above reads: if you flip the bits, you are getting the negative of it subtracted by 1. e.g. ~4 = -5, ~5 = -6, …, ~(-5) = 4, ~(-4) = 3, …

\begin{align}
x = \overline{-x-1} \\
\end{align}

the above reads: any number can be represented by its negative minus -1, then bit-flipped.

\begin{align}
\overline{-x} = x-1 \\
-x = \overline{x-1}
\end{align}

reads: to flip sign, subtract one first then flip bits

So it means to change signs, you can choose to subtract one first then flip bits or flip bits first then add one.

-x=\overline{x-1} = \overline{x}+1 

Let’s try it with 2 instead of 1:

\overline{x-2} = \overline{(x-1)-1} = \overline{x-1}+1 = (\overline{x}+1)+1 = \overline{x}+2

You can generalize it to an arbitrary number by subtracting -1 more under the bar on the left hand side and you will get +1 more on the right hand side. Every extra -1 under the bar (bit flips) shows up as +1 outside the bar (bit flips).

\overline{x-3} = \overline{(x-2)-1} = \overline{x-2}+1 = (\overline{x}+2)+1 = \overline{x}+3

This matches the observation that complement schemes (one’s or two’s) both have increasing magnitude move in opposite directions for positive and negative numbers. Look at this table:

UnsignedBinaryTwo’s Complement
7111-1
6110-2
5101-3
4100-4
30113
20102
10011
00000
A very important observation that’d be used over and over blow is that in two’s complement, -1 is always a mask of all binary ‘1’s regardless of the word width.

This rule can also read as: magnitude offsets goes in opposite directions

\begin{align}
-x+(n-1)=\overline{x-n} = \overline{x}+n
\end{align}

Note that this is NOT distributing bit-flip to two addition/subtraction despite it resembles it with an important distinction that the sign of n changed without turning into (n-1). If it were to distribute, you’ll get (n-2) on the left-hand-side instead of the (n-1) term because the -1 would have been counted twice under distribution.

Bit flips simply doesn’t distribute over the 4 basic (algebraic field) operations. The two’s complement offset is done once and only once when you change the overall representation no matter how many components you break it down into. It’s merely done to shift over the -0 in one’s complement so there’s an extra space for an extra negative number -2^n$ which its positive counterpart +2^n$ is not representable without starting a new digit.

Note to self: the INT_MIN is just the sign bit of ‘1’ followed by all zeros after.

Concept: XOR can be used for bit flips or check for bit changes

Concept: Top bit holds the sign

Sounds simple, but if you keep in mind that (x<0) is really asking to see if the top bit is 1, you can check if two numbers has opposite signs without bit shifting it down by simply XOR-ing them (anything below the top bit are ignored) and use (x^y)<0 to check for the resulting top bit is 1, which signals that the sign bits are different.

Concept: Sign extensions (the top/sign bit gets drag-copied when right shifted)

When you right shift (in signed integers), the top (sign) bit gets drag-copied (sign extended) by the number of bits you right shifted. (Obviously for signed integers, right shifts are zero-filled)

Can exploit this to

  • drag the top (sign) bit all the way down to the bottom (so you either get all 1s or 0s) to provide a conditional mask based on the sign (see below)
1??????? \gg 7 \textnormal{ (i.e. type bit width - 1)} = 11111111_2 = -1_{10} \\
0??????? \gg 7 \textnormal{ (i.e. type bit width - 1)} = 00000000_2 = +0_{10} \\

Signed extensions also means a negative number will stay negative and a positive number will stay positive if you right shift

Sign extension behavior is not guaranteed by 1987 ANSI C, but it’s standard on pretty much anything more modern than that. Just make sure anything that uses this behavior are inlined (so the implementation can be easily swapped out), well documented/commented, and platform checks/switches are in place, and there’s a way to quickly check with the slower but platform independent implementation.

Concept: Getting a bit mask of a 1s (if true) and all 0s (if false)

The ability to convert a logic evaluation (condition) that gives

\begin{align*}
00000001_2 &= +1_{10} & \mathrm{(true)}\\
00000000_2 &= +0_{10} & \mathrm{(false)}
\end{align*}

into a conditional mask that gives

\begin{align*}
11111111_2 &= -1_{10} & \mathrm{(true)}\\
00000000_2 &= +0_{10} & \mathrm{(false)}
\end{align*}

is the basis of many branchless ‘drop/keep this if that’ operations.

This can also be achieved by

  • putting a minus sign in front, such as -(cond) that will convert a (+1, 0) into (-1, 0), or
  • more efficiently exploiting sign extensions by dragging the top bit to the bottom (by right shifting by the type’s bit length-1)
  • computing absolute using the two’s complement’s definition of flip all bits and add 1: drag out a mask that shows that sign, which happens to be a do nothing if all 0s and flip all bits if all 1s in an xor, while the mask of all 1s, which is -1, when subtracted, becomes +1 needed to finish the two’s complement (and that’s subtract by 0 for already positive value).

Concept: 2^n - 1 sets all binary digits below it to a stream of 1s

When you count binary numbers up, you must exhaust all the lower digits by filling them with all 1s before you get to advance to (set) a new digit on the left of them. For example,

\begin{align*}
1000_2 &= +8_{10}\\
0111_2 &= +7_{10}
\end{align*}

This can be exploited to create bit-masks that preserves all digits on the left of the first ‘1’ seen from the right (LSB), ‘0’ at that lowest (LSB) set bit (aka ‘1’), and all ‘1’s below it.

\begin{align*}
0110,1000_2 &= +104_{10}\\
0110,0111_2 &= +103_{10}
\end{align*}

Binary digits are are the (1 or 0) coefficients of a linear combination of powers of 2. Having a loner ‘1’ (aka everything else is 0) means the number is a power of 2.

Being the lone ‘1’ bit in the number means every bit above it must be zero. Any ‘1’s above the right-most ‘1’ means it’s not the loner, hence not a power of 2.

If you subtract 1 from the power-of-2 number, only all bits below (not including) the line ‘1’ bit becomes 1, and that ‘1’ bit position become zero, and as mentioned before, all bits above it are 0s by definition since the ‘1’ we are working on is a loner.

Since the digits in 2^n and 2^n - 1 are mutually exclusive (see example below)

\begin{align*}
0000,1000_2 &= +8_{10} = 2^3 \\
0000,0111_2 &= +7_{10} = 2^3 -1
\end{align*}

we can be sure that if we AND them we must get 0 (because one of them has to be zero) and if we XOR them we must get 1. But which one to use?

\begin{align*}
0000,1000_2 &= +8_{10} &=& 2^3 \\
0000,0111_2 &= +7_{10} &=& 2^3 -1 \\
0000,1111_2 &= +15_{10} &=& 2^3 \textnormal{ or } (2^3 -1)\\
0000,1111_2 &= +15_{10} &=& 2^3 \textnormal{ xor } (2^3 -1)\\
0000,0000_2 &= +0_{10} &=& 2^3 \textnormal{ and } (2^3 -1)\\
\end{align*}

Let’s also check for the non-power of two case

\begin{align*}
0110,1000_2 &= +104_{10}\\
0110,0111_2 &= +103_{10}\\
0110,1111_2 &= +111_{10} &=& 104_{10} \textnormal{ or } 103_{10}\\
0000,1111_2 &= +15_{10} &=& 104_{10} \textnormal{ xor } 103_{10}\\
0110,0000_2 &= +96_{10} &=& 104_{10} \textnormal{ and } 103_{10}\\  
\end{align*}

The xor approach does not work because the upper bits are invariant, so we cannot detect the presence of the upper set bits (upper ‘1’s). It unconditionally gives the same bit pattern (mask) marking the lowest first set bit and everything below it 1s and 0s for everything else above it. Which can be exploited to simplify counting the consecutive trailing zeros (from the right) by turning it into counting the contiguous 1s in this invariant pattern, or add 1 to it and binary search the position of the set bit and subtract 1 because the said bit was made into the invariant (xor) pattern as well so +1 move onto the next upper binary digit.

The or approach detects the presence of the upper set bits but it’s a pain to mask out the invariant lower 1s, which curiously you can do by XOR-ing with the invariant pattern generated by 2^n \textnormal{ xor } 2^n-1 or you can do AND-NOT-ing

\begin{align*}
0110,1000_2 &= +104_{10}\\
0110,0111_2 &= +103_{10}\\
0110,1111_2 &= +111_{10} &=& 104_{10} \textnormal{ or } 103_{10}\\
0000,1111_2 &= +15_{10} &=& 104_{10} \textnormal{ xor } 103_{10}\\
1111,0000_2 &= +240_{10} &=& \overline{104_{10} \textnormal{ xor } 103_{10}}\\
0110,0000_2 &= +96_{10} &=& (104_{10} \textnormal{ or } 103_{10}) \textnormal{ and } \overline{(104_{10} \textnormal{ xor } 103_{10})}\\  
0110,0000_2 &= +96_{10} &=& (104_{10} \textnormal{ or } 103_{10}) \textnormal{ xor } (104_{10} \textnormal{ xor } 103_{10})\\
0110,0000_2 &= +96_{10} &=& 104_{10} \textnormal{ and } 103_{10}\\  
\end{align*}

Which and happens to already does the job by keeping the top bits (which non-zero value detects their presence) yet unconditionally clear the lowest set bit and everything below it.

The gut of is x & (x-1) maneuver is that it clears the bit from the lowest set bit and everything down below

clearLowestSetBitAndEverythingBelow(x): x & (x-1)

This is used by Brian W. Kernighan to count number of set bits by knocking them one off at a time starting from below. Of course the worst case scenario is when the 1s are so dense that the algorithm must go through every bit without jumping past the zeros.

So the solution is

isPowerOfTwo(x): clearLowestSetBitAndEverythingBelow(x)==0
isPowerOfTwo(x): (x & (x-1))==0

However a special case escaped us, which is x=0. 0x0000 & 0xFFFF is 0, but 0 isn’t a power of 2 unless you consider the minus infinity power which is the territory of floating point anyway. This can be easily patched by making the result unconditionally false if x=0 in the first place.

isPowerOfTwo(x): x && ((x & (x-1))==0)

Note the logical && which means x is first tested for its non-zeroness (by boiling any non-zero value down to +1) and it also enables the efficient short-circuit evaluation which if x is false, which means the result is unconditionally false under &&, the rest are irrelevant so it’s not evaluated.

Concept: Look up table

This is unconditionally the O(1) way because you have a mask of every bit in the type ready and you could index by it. However the penalty is that a load operation could be expensive if not everything can fit in the register file.

Loading

Python Cheat Sheet

Built-in functions

  • breakpoint(): Python’s version of MATLAB’s keyboard() command
  • callable(): Like MATLAB’s isfunction() but it really checks if there’s a __call__ method
  • getattr()/hasattr(): MATLAB’s getfield()/isfield(). The 3rd parameter of getfield() is a shortcut to spit out a default if there’s no such field/attribute, which MATLAB doesn’t have
  • globals()/locals(): more convenient than MATLAB because the whole workspace (current variables) are accessed as a dictionary in Python by calling locals() and globals()
  • id(): memory address of the item where the variable (reference) is pointing to. Think of it as &x in C.
  • isinstance(): MATLAB’s isa()
  • next(): Python favors not actually computing the values until needed so instead it offers a generator (forward iterable) function that spits out one value at each time you kick it with next() and you can’t go back.
  • chr()/ord(): analogous to MATLAB’s char()/double() cast for characters
  • Python’s exponentiation is **, not ^ like most other languages (C does not have exponentiation symbol, and ^ was used for xor)
  • print(…, flush=false) allows a courtesy flush
  • repr(): MATLAB’s version of disp(), also overloadable standard interface
  • slice(): MATLAB’s equivalent of colon() special interface

Context Manager

@contextlib.contextmanager decorators basically splits a set-try-yield-finally boilerplate function into 3 parts: __enter__ (everything above yield), BODY (where the yield goes to) and __exit__ (everything below yield), since a with-as statement is a rigidly defined try-finally block, roughly like this:

with EXPR as f:
  BODY(using f)
__enter__: f=EVAL(EXPR)
try:
  # f isn't evaluated till yield  
  yield f  # Goes to BODY
finally:
  __exit__: cleanup(f)

Loading