summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rwxr-xr-xbuild/sim86bin61656 -> 66448 bytes
-rwxr-xr-xbuild/sim86_metabin177464 -> 181968 bytes
-rw-r--r--computerenhance.md46
-rw-r--r--src/sim86.cpp39
4 files changed, 72 insertions, 13 deletions
diff --git a/build/sim86 b/build/sim86
index 08e2c7c..3777fdb 100755
--- a/build/sim86
+++ b/build/sim86
Binary files differ
diff --git a/build/sim86_meta b/build/sim86_meta
index c7b2e28..d01767d 100755
--- a/build/sim86_meta
+++ b/build/sim86_meta
Binary files differ
diff --git a/computerenhance.md b/computerenhance.md
index 1733704..8eaa819 100644
--- a/computerenhance.md
+++ b/computerenhance.md
@@ -28,7 +28,7 @@ Only learning about how performance works is enough.
# Solution
- Keep result of instructions in mind, not code
-- Learn what the maximum speed of something should be```
+- Learn what the maximum speed of something should be
# 4. [Waste](https://www.computerenhance.com/p/waste)
Instructions that do not need to be there.
@@ -53,7 +53,7 @@ Key points:
- to measure overhead + loop we can measure cycles
- more instructions != more time
-Python had 180x instructions and was 130x slower.```
+Python had 180x instructions and was 130x slower.
# 5. [Instructions Per Clock](https://www.computerenhance.com/p/instructions-per-clock)
*speed of instructions*
@@ -76,13 +76,13 @@ Python had 180x instructions and was 130x slower.```
Reducing ratio of loop overhead / work
- example: loop unrolling
- ```c
+```c
for (i = 0; i < count; i +=2)
{
sum += input[i];
sum += input[i + 1];
}
- ```
+```
Weird that it would go until to 1x add per cycle.
- what are the chances? overhead??
@@ -98,7 +98,7 @@ Multiple chains can help break through limits.
- "boosting the IPL"
CPUs are designed for more computation so boosting IPL in a loop that
-does not do a lot of computation will bring less benefits.```
+does not do a lot of computation will bring less benefits.
# 6. [Monday Q&A (2023-02-05)](https://www.computerenhance.com/p/monday-q-and-a-2023-02-05)
# JIT
@@ -168,7 +168,7 @@ input[3]
input += 4
```
# Three-based addition:
-- common technique to work out a dependency chain```
+- common technique to work out a dependency chain
# 7. [Single Instruction, Multiple Data](https://www.computerenhance.com/p/single-instruction-multiple-data)
*Amount of instructions*
@@ -202,7 +202,7 @@ input += 4
# Difficulty
- SIMD does not care about how data is organized
-- easy with adds```
+- easy with adds
# 8. [Caching](https://www.computerenhance.com/p/caching)
*speed of instructions*
@@ -479,14 +479,14 @@ Because there are many dependencies on loads it is very important.
# Forcing out of memory
- bandwith does not increase a lot when using main memory
- depending on the chip
-- L3 cache and main memory are shared (not big speed ups)```
+- L3 cache and main memory are shared (not big speed ups)
# 11. [Python Revisited](https://www.computerenhance.com/p/python-revisited)
Assembly is what determines the speed.
# Python
- doing every sum in python is slow
-- numpy is faster when you have supplied the array with a type```
+- numpy is faster when you have supplied the array with a type
# 12. [Monday Q&A #3 (2023-02-20)](https://www.computerenhance.com/p/monday-q-and-a-3-2023-02-20)
# Hyperthreading & Branch prediction
- hyperthreads ::
@@ -550,14 +550,14 @@ Assembly is what determines the speed.
# How to get memory bandwidth
-- https://github.com/cmuratori/blandwidth```
+- https://github.com/cmuratori/blandwidth
# 13. [The Haversine Distance Problem](https://www.computerenhance.com/p/the-haversine-distance-problem)
- Computing arc length between two coordinates.
- You want to do the math first.
- CPU is made for it
- Second is the *Input*
-- Reading the data can take a long time.```
+- Reading the data can take a long time.
# 14. ["Clean" Code, Horrible Performance](https://www.computerenhance.com/p/clean-code-horrible-performance)
@@ -740,16 +740,38 @@ By using estimation you can know what your performance *should* be.
clocks=cycles
# 38. [Monday Q&A #10 (2023-05-08)](https://www.computerenhance.com/p/monday-q-and-a-10-2023-05-08)
-With SIMD using smaller numbers will be faster.
+With SIMD using smaller bit-widths will be faster.
+For better cycle estimations it's better to try and simulate the microcode which has been reverse engineered from die shots.
+2 transfers can mean read + write, eg. `add [bx], 20`.
+There is microcode for loads and stores but some lines get processed and "skipped" which can account for a cycle.
# 39. [From 8086 to x64](https://www.computerenhance.com/p/from-8086-to-x64)
+E prefix "widens" the register to 32 bits for backwards compatibility. Analogously, the R prefix "widens" the register to 64 bits.
+R8-15 or the new 64 bits registers. Suffixes: B-8, W-16, D-32, byte, word, double word, quad word
+You can use any registers for the 2 terms of effective addressing. One of the terms can be a scalar for multiplying. PTR is optional for specifying it's memory.
+
# 40. [8086 Internals Poll](https://www.computerenhance.com/p/8086-internals-poll)
+
# 41. [How to Play Trinity](https://www.computerenhance.com/p/how-to-play-trinity)
+
# 42. [Monday Q&A #11 (2023-05-15)](https://www.computerenhance.com/p/monday-q-and-a-11-2023-05-15)
+`mov edi, edi` zeroes the upper bits of a register. Since the ABI specifies edi as a parameter it needs to be 0.
+Redundant register moves do not impact the backend performance (where dependency chains get resolved).
+There are instructions that are not useful anymore.
+Some instructions cannot be accessed.
+Theres is `sgx` extension that allows to do encrypted memory, transactional memory system.
+SIMD registers can be split in lanes. But in "normal" registers this is not supported anymore.
+A segfault is an interrupt from the interrupt table. Eg. paging in unmapped memory
+
# 43. [8086 Simulation Code Review](https://www.computerenhance.com/p/8086-simulation-code-review)
+
# 44. [Part One Q&A and Homework Showcase](https://www.computerenhance.com/p/part-one-q-and-a-and-homework-showcase)
+
# 45. [The First Magic Door](https://www.computerenhance.com/p/the-first-magic-door)
+
# 46. [Monday Q&A #12 (2023-05-22)](https://www.computerenhance.com/p/monday-q-and-a-12-2023-05-22)
+TODO: more information about 8086 misc.
+
# 47. [Generating Haversine Input JSON](https://www.computerenhance.com/p/generating-haversine-input-json)
# 48. [Monday Q&A #13 (2023-05-29)](https://www.computerenhance.com/p/monday-q-and-a-13-2023-05-29)
# 49. [Writing a Simple Haversine Distance Processor](https://www.computerenhance.com/p/writing-a-simple-haversine-distance)
diff --git a/src/sim86.cpp b/src/sim86.cpp
index e5480d7..7c444bd 100644
--- a/src/sim86.cpp
+++ b/src/sim86.cpp
@@ -157,7 +157,6 @@ Run8086(psize MemorySize, u8 *Memory)
if(Decoded.Op)
{
u32 OldIPRegister = IPRegister;
- IPRegister += Decoded.Size;
#if SIM86_INTERNAL
printf("%s ;", Sim86_MnemonicFromOperationType(Decoded.Op));
@@ -300,6 +299,42 @@ Run8086(psize MemorySize, u8 *Memory)
}
#endif
}
+ else if(Decoded.Op == Op_ret)
+ {
+ printf("\n");
+ printf("STOPONRET: Return encountered at address %d.\n", IPRegister);
+
+ break;
+ }
+ else if(Decoded.Op == Op_inc)
+ {
+ Assert(DestinationOperand->Type == Operand_Register);
+ Assert(SourceOperand->Type == Operand_None);
+ *Destination += 1;
+ }
+ else if(Decoded.Op == Op_test)
+ {
+
+ Assert(DestinationOperand->Type == Operand_Register);
+ Assert(SourceOperand->Type == Operand_Register || SourceOperand->Type == Operand_Immediate);
+
+ s32 Value =((Decoded.Flags & Inst_Wide) ?
+ (u16)((u16)*Destination & ((u16)*Source)) :
+ (u8)((u8)*Destination & ((u8)*Source)));
+ FlagsFromValue(&FlagsRegister, Decoded.Flags, Value);
+ }
+ else if(Decoded.Op == Op_xor)
+ {
+
+ Assert(DestinationOperand->Type == Operand_Register);
+ Assert(SourceOperand->Type == Operand_Register || SourceOperand->Type == Operand_Immediate);
+
+ s32 Value =((Decoded.Flags & Inst_Wide) ?
+ (u16)((u16)*Destination ^ ((u16)*Source)) :
+ (u8)((u8)*Destination ^ ((u8)*Source)));
+ FlagsFromValue(&FlagsRegister, Decoded.Flags, Value);
+ *Destination = Value;
+ }
else if(Decoded.Op == Op_cmp)
{
Assert(DestinationOperand->Type == Operand_Register);
@@ -361,6 +396,8 @@ Run8086(psize MemorySize, u8 *Memory)
Assert(0 && "Op not implemented yet.");
}
+ IPRegister += Decoded.Size;
+
#if SIM86_INTERNAL
printf(" ip:0x%x->0x%x", OldIPRegister, IPRegister);
#endif