Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions py-src/data_formulator/tables_routes.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect we should do this function as recalculate_derived_data, we need to sub functions:

  1. recalc_derived_data_py(): update a table based on python, if this is the case, we will provide a list of input tables, and the derived table code, then return the new derived table by applying the python code on updated inputs
  2. recalc_derived_data_sql(): this will simply rerun the query against duck db

decide which one to call based on whether the derived data is virtual or not in the backend. Call each table update independently (i.e., when a table update, we first find the list of derived table that are affected, and then for each affected table, gather their actual input data, and rerun code to get the new content and update) we don't need to do this recursively)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for now, let's only do recalc_derived_data_py, as we first only support python data refresh. This simplifies the refresh logic for remote/duckdb data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to recalculate_derived_data endpoint with recalc_derived_data_py() function. Virtual/SQL table support skipped - returns 'skipped' status. Each affected table is processed independently with actual input data. Fixed in commit f611d1f.

Original file line number Diff line number Diff line change
Expand Up @@ -838,7 +838,98 @@
except Exception as e:
logger.error(f"Error ingesting data from data loader: {str(e)}")
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg
}), status_code


@tables_bp.route('/refresh-derived-data', methods=['POST'])
def refresh_derived_data():
"""Refresh derived data by re-executing Python code on updated base table"""
try:
from data_formulator.py_sandbox import run_transform_in_sandbox2020

data = request.get_json()

# Get updated base table data and transformation info
updated_table = data.get('updated_table') # {name, rows, columns}
derived_tables = data.get('derived_tables', []) # [{id, code, source_tables: [names]}]

if not updated_table:
return jsonify({"status": "error", "message": "No updated table provided"}), 400

if not derived_tables:
return jsonify({"status": "error", "message": "No derived tables to refresh"}), 400

# Validate updated table has same columns as before
updated_table_name = updated_table['name']
updated_columns = set(updated_table['columns'])

results = []

# Process each derived table
for derived_info in derived_tables:
try:
code = derived_info['code']
source_table_names = derived_info['source_tables']
derived_table_id = derived_info['id']

# Prepare input dataframes
df_list = []

for source_name in source_table_names:
if source_name == updated_table_name:
# Use the updated data
df = pd.DataFrame(updated_table['rows'])
else:
# Fetch from database
with db_manager.connection(session['session_id']) as db:
result = db.execute(f"SELECT * FROM {source_name}").fetchdf()
df = result

df_list.append(df)

# Execute the transformation code in subprocess for safety
exec_result = run_transform_in_sandbox2020(code, df_list, exec_python_in_subprocess=True)

if exec_result['status'] == 'ok':
output_df = exec_result['content']

# Convert to records format for JSON serialization
rows = json.loads(output_df.to_json(orient='records', date_format='iso'))
columns = list(output_df.columns)

results.append({
'id': derived_table_id,
'status': 'success',
'rows': rows,
'columns': columns
})
else:
results.append({
'id': derived_table_id,
'status': 'error',
'message': exec_result['content']
})

except Exception as e:
logger.error(f"Error refreshing derived table {derived_info.get('id')}: {str(e)}")
results.append({
'id': derived_info.get('id'),
'status': 'error',
'message': str(e)
})

return jsonify({
"status": "success",
"results": results
})

except Exception as e:
logger.error(f"Error refreshing derived data: {str(e)}")
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg
}), status_code
20 changes: 20 additions & 0 deletions src/app/dfSlice.tsx
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to update concept shelf, since it requires all columns are the same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed concept shelf update from updateTableRows action since columns must be identical. Fixed in commit f611d1f.

Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,26 @@ export const dataFormulatorSlice = createSlice({
let attachedMetadata = action.payload.attachedMetadata;
state.tables = state.tables.map(t => t.id == tableId ? {...t, attachedMetadata} : t);
},
updateTableRows: (state, action: PayloadAction<{tableId: string, rows: any[]}>) => {
let tableId = action.payload.tableId;
let rows = action.payload.rows;
state.tables = state.tables.map(t => {
if (t.id == tableId) {
// Update rows while preserving other table properties
return {...t, rows};
}
return t;
});

// Update concept shelf items for this table if columns changed
let table = state.tables.find(t => t.id == tableId);
if (table) {
// Remove old field items for this table
state.conceptShelfItems = state.conceptShelfItems.filter(f => f.tableRef != tableId);
// Add new field items
state.conceptShelfItems = [...state.conceptShelfItems, ...getDataFieldItems(table)];
}
},
extendTableWithNewFields: (state, action: PayloadAction<{tableId: string, columnName: string, values: any[], previousName: string | undefined, parentIDs: string[]}>) => {
// extend the existing extTable with new columns from the new table
let newValues = action.payload.values;
Expand Down
Loading