Bugs and confusion on the programs sampling process

I don't know if intentional, but it looks like sampling of the top, diverse, and inspirations programs is a bit messed up in both the iterative and the parallel cases.

According to build_prompt():
 * top_programs: List of top-performing programs (best by fitness)
 * inspirations: List of inspiration programs (diverse/creative examples)

Parameters in yaml: top and diverse programs.
But in the code, the diverse programs are not the inspirations (should they be?).

As far as I understood build_prompt(top_programs, inspirations) calls _format_evolution_history() which splits top_programs into top and diverse (it would be more understandable having already two parameters):
```
selected_top = top_programs[: min(self.config.num_top_programs, len(top_programs))] # selects top-N as top programs
remaining_programs = top_programs[self.config.num_top_programs :] # random sample diverse programs
```
So the diverse programs are actually "almost-top" programs (if that was intended, renaming would help).
Differently, inspirations are the diverse/creative examples unrelated from the diverse parameter in the yaml (was that intended?).
Plus, the inspirations seem also to contain some of the top programs: is there a reason to (i) Include the island's best program if available and different from parent and (ii) Add top programs from the island as inspirations; if they are already included in the top programs set?

I don't know if those are bugs or everything is intentional, but it looks a bit confusing.
A more uniform implementation for iteration and parallel (where possible) would also make the code more understandable and manageable.

I was expecting something like:
 * the top programs should be the top N programs with N=num_top_programs, retrieved by database.get_top_programs()
 * the diverse/inspirations programs should be N random/most diverse programs with N=num_diverse_programs, retrieved by _sample_inspirations()
but the current design intentionally seems having 3 sections (top / diverse / inspirations) with a nomenclature inconsistent with the parameters to the user.


In details:

### Issue 1) Sampling parameters
iteration.py:
```
parent, inspirations = database.sample(num_inspirations=config.prompt.num_top_programs)
```
process_parallel.py:
```
parent, inspirations = self.database.sample_from_island(
    island_id=target_island, num_inspirations=self.config.prompt.num_top_programs
)
```
Shouldn't num_inspirations be either equal to num_diverse_programs (or a new parameter num_inspiration_programs)?

----------------------------------------

### Issue 2) Get top parameters 
iteration.py:
```
parent, inspirations = database.sample(num_inspirations=config.prompt.num_top_programs)
[...]
island_top_programs = database.get_top_programs(5, island_idx=parent_island)
island_previous_programs = database.get_top_programs(3, island_idx=parent_island)
[...]
prompt = prompt_sampler.build_prompt(
            [...]
            previous_programs=[p.to_dict() for p in island_previous_programs],
            top_programs=[p.to_dict() for p in island_top_programs],
            inspirations=[p.to_dict() for p in inspirations],
)
```
Doesn't this ignore the `config.yaml` parameters `num_top_programs` and `num_diverse_programs`?
Shouldn't it pass "combined_score" as a metric (if it exists)?

----------------------------------------

### Issue 3) Overlap between top programs and inspirations
database.py
```
# Include the island's best program if available
if island_best_id is not None and island_best_id != parent.id:
    inspirations.append(island_best)

# Add top programs from the island as inspirations
top_n = max(1, int(n * self.config.elite_selection_ratio))
top_island_programs = self.get_top_programs(n=top_n, island_idx=parent_island)
```
Was there any empirical reasons to include the top programs among the inspirations?
It seems that the same programs can appear in both "Top Performing Programs" and "Inspiration Programs" in the prompt.
This is more frequent to the single-process path


----------------------------------------

### Issue 4) Different invokers
Why only process_parallel.py invokes sample_from_island() while iterations.py invokes sample()? A uniform implementation would be clearer and less open to inconsistencies.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs and confusion on the programs sampling process #452

Issue 1) Sampling parameters

Issue 2) Get top parameters

Issue 3) Overlap between top programs and inspirations

Issue 4) Different invokers

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bugs and confusion on the programs sampling process #452

Description

Issue 1) Sampling parameters

Issue 2) Get top parameters

Issue 3) Overlap between top programs and inspirations

Issue 4) Different invokers

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions