Spaces:

itsalissonsilva
/

test

Sleeping

App Files Files Community

itsalissonsilva commited on Jun 11, 2025

Commit

f907e1a

verified ·

1 Parent(s): 37559e5

Update src/streamlit_app.py

Browse files

Files changed (1) hide show

src/streamlit_app.py +12 -13

src/streamlit_app.py CHANGED Viewed

@@ -15,16 +15,12 @@ client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
 PROMPT_INSTRUCTIONS_TEXT = """
 You are a forensic auditor AI with deep domain expertise and a sharp eye for irregularities. Your job is to identify **anomalies** in a single column of financial data.
 Analyze the values provided and return only values that are:
 - **Numerical outliers**: extremely high/low or oddly rounded numbers
 - **Format inconsistencies**: strange symbols, irregular formatting, or data corruption
 - **Rare or suspicious values**: strings or categories that do not appear to fit the overall pattern
 ONLY analyze the values from the provided column, without relying on any external context.
 Return ONLY the following JSON object and nothing else:
 {
   "anomalies": [
     {
@@ -91,27 +87,31 @@ st.markdown("""
 This tool combines machine learning and large language models to detect anomalies in datasets. We first apply isolation forest to the full dataset to flag data-level outliers. Then, you can select one column to perform a second pass of analysis using OpenAI's GPT-4, which focuses on semantic and contextual anomalies within that column only (e.g. Payment_Method column).
 """)
-# Button to load sample data
-df = None
-sample_loaded = False
 if st.button("Load sample dataset"):
-    sample_path = "src/df_crypto.csv"
     try:
-        df = pd.read_csv(sample_path)
-        sample_loaded = True
         st.success("Sample dataset loaded from `src/df_crypto.csv`.")
     except Exception as e:
         st.error(f"Could not load sample dataset: {e}")
 # File upload
-if not sample_loaded:
     uploaded_file = st.file_uploader("Or upload your own CSV file", type=["csv"])
     if uploaded_file:
         try:
-            df = pd.read_csv(uploaded_file)
         except Exception as e:
             st.error(f"Could not read uploaded CSV. Error: {e}")
 if df is not None:
     st.subheader("Full Dataset")
     st.dataframe(df, use_container_width=True)
@@ -126,7 +126,6 @@ if df is not None:
     # ---------------- LLM Section ----------------
     st.markdown("### LLM-Based Anomaly Detection (specific column)")
     selected_column = st.selectbox("Select a column to analyze with LLM:", df.columns)
     if st.button("Run LLM Anomaly Detection on selected column"):

 PROMPT_INSTRUCTIONS_TEXT = """
 You are a forensic auditor AI with deep domain expertise and a sharp eye for irregularities. Your job is to identify **anomalies** in a single column of financial data.
 Analyze the values provided and return only values that are:
 - **Numerical outliers**: extremely high/low or oddly rounded numbers
 - **Format inconsistencies**: strange symbols, irregular formatting, or data corruption
 - **Rare or suspicious values**: strings or categories that do not appear to fit the overall pattern
 ONLY analyze the values from the provided column, without relying on any external context.
 Return ONLY the following JSON object and nothing else:
 {
   "anomalies": [
     {
 This tool combines machine learning and large language models to detect anomalies in datasets. We first apply isolation forest to the full dataset to flag data-level outliers. Then, you can select one column to perform a second pass of analysis using OpenAI's GPT-4, which focuses on semantic and contextual anomalies within that column only (e.g. Payment_Method column).
 """)
+# Initialize session state for df
+if "df" not in st.session_state:
+    st.session_state.df = None
+# Load sample data
 if st.button("Load sample dataset"):
     try:
+        st.session_state.df = pd.read_csv("src/df_crypto.csv")
         st.success("Sample dataset loaded from `src/df_crypto.csv`.")
     except Exception as e:
         st.error(f"Could not load sample dataset: {e}")
 # File upload
+if st.session_state.df is None:
     uploaded_file = st.file_uploader("Or upload your own CSV file", type=["csv"])
     if uploaded_file:
         try:
+            st.session_state.df = pd.read_csv(uploaded_file)
+            st.success("Custom dataset uploaded.")
         except Exception as e:
             st.error(f"Could not read uploaded CSV. Error: {e}")
+# Use persisted df
+df = st.session_state.df
 if df is not None:
     st.subheader("Full Dataset")
     st.dataframe(df, use_container_width=True)
     # ---------------- LLM Section ----------------
     st.markdown("### LLM-Based Anomaly Detection (specific column)")
     selected_column = st.selectbox("Select a column to analyze with LLM:", df.columns)
     if st.button("Run LLM Anomaly Detection on selected column"):